Overview of PRESTA at IberLEF 2025: Question Answering Over Tabular Data In Spanish

Jorge Osés Grijalba; Luis Alfonso Ureña López; Eugenio Martínez Cámara; José Camacho Collados

Ayuda

Overview of PRESTA at IberLEF 2025: Question Answering Over Tabular Data In Spanish

Autores: Jorge Osés Grijalba, Luis Alfonso Ureña López , Eugenio Martínez Cámara , José Camacho Collados
Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 75, 2025 (Ejemplar dedicado a: Procesamiento del Lenguaje Natural, Revista nº 75, septiembre de 2025), págs. 475-486
Idioma: inglés
Títulos paralelos:
- Presentación de PRESTA en IberLEF 2025: Respuesta a Preguntas sobre Datos Tabulares en Español
Enlaces
- Texto completo
Resumen
- español
  En este trabajo presentamos los hallazgos y resultados de PRESTA en IberLEF 2025, centrada en la respuesta a preguntas sobre datos tabulares en español. La tarea desafía a los participantes a desarrollar sistemas capaces de interpretar preguntas en lenguaje natural y recuperar respuestas precisas a partir de fuentes tabulares semiestructuradas en español. En este artículo describimos el diseño de la tarea, la construcción del conjunto de datos, la metodología de evaluación y los sistemas participantes. Analizamos diversas estrategias propuestas y discutimos las principales tendencias observadas. Los resultados muestran que los enfoques basados en modelos de lenguaje de gran tamaño (LLMs) superaron claramente a los métodos tradicionales, destacando especialmente los resultados de modelos pequeños de código abierto, que con una buena estrategia detrás pueden llegar a superar los resultados de otros grandes modelos privados. Estos resultados confirman que el buen desempeño de los LLMs en inglés también se extiende al español en el contexto de la respuesta a preguntas sobre tablas, aunque persisten ciertos retos lingüísticos y específicos del dominio.
- English
  We present the findings and results of the PRESTA track at IberLEF 2025, focused on question answering over tabular data in Spanish. The task challenges participants to build systems capable of interpreting natural language questions and retrieving accurate answers from semi-structured tabular sources in Spanish. In this paper, we describe the task design, dataset construction, evaluation methodology, and participant systems. We analyze a range of submitted approaches and discuss key trends observed across systems. Our results show that methods leveraging large language models (LLMs) clearly outperformed traditional pipelines, with larger multilingual models exhibiting very high accuracy. It is of note that the performance of small open-source models is up to par with the bigger proprietary ones when paired with good system designs. These findings confirm that the strong performance of LLMs in English carries over to Spanish in the context of tabular question answering, though some linguistic and domain-specific challenges remain.
Referencias bibliográficas
- 40dB, E. P. 2022. Percepción del amor. https://elpais.com/sociedad/2022-06-05/consulte-todos-los-datos-internosde-la-encuesta-de-el-pais-sobre-lapercepcion-del-amor-cuestionarios-y-...
- 40dB, E. P. 2024a. Encuesta de igualdad marzo 2024. https://elpais.com/espana/2024-03-11/consulte-todos-los-datos-internosde-la-encuesta-de-el-pais-de-marzocuestionarios-cruces-y-respuestas.html.
- 40dB, E. P. 2024b. Encuesta sobre el sue˜no. https://elpais.com/ciencia/2024-02-25/consulte-todos-los-datos-internosdel-barometro-de-el-pais-cuestionarioscruces-y-respuestas-individuales.html.
- AI, D. 2024a. Deepseek-coder-v3: Opensource multilingual code models. https://arxiv.org/abs/2405.13441. Code-specialized multilingual LLMs.
- AI, D. 2024b. Deepseek-r1. https://huggingface.co/deepseek-ai. Instruction-tuned base model by DeepSeek.
- AI, M. 2024c. Llama 3: Open foundation models from meta. https://ai.meta.com/blog/meta-llama-3/. Includes 8B and 70B variants; version 3.1...
- AI, M. 2023. Mistral 7b. https://huggingface.co/mistralai/Mistral-7B-v0.1. Released by Mistral AI, 2023.
- AI, M. 2024. Codestral-22b. https://huggingface.co/mistralai/Codestral-22B. Code-specialized model from Mistral AI.
- Aly, R., Z. Guo, M. S. Schlichtkrull, J. Thorne, A. Vlachos, C. Christodoulopoulos, O. Cocarascu, and A. Mittal. 2021. The fact extraction...
- Arazi, A., E. Shapira, and R. Reichart. 2025. Tabstar: A foundation tabular model with semantically target-aware representations.
- Ashury-Tahan, S., Y. Mai, R. C, A. Gera, Y. Perlitz, A. Yehudai, E. Bandel, L. Choshen, E. Shnarch, P. Liang, and M. Shmueli-Scheuer. 2025....
- Brown, T. B., B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss,...
- CEA. 2023. Barómetro andaluz septiembre 2023. https://www.centrodeestudiosandaluces.es/barometro/barometro-andaluzde-septiembre-2023.
- Chen, W. 2023. Large language models are few(1)-shot table reasoners. In Findings of the Association for Computational Linguistics: EACL 2023,...
- CIS. 2021a. Salud mental durante la pandemia. https://www.cis.es/es/detalleficha-estudio?idEstudio=14676.
- CIS. 2021b. Salud mental durante la pandemia. https://datos.gob.es/es/catalogo/ea0022266-2193comportamiento-de-los-espanolesante-las-vacaciones-iii.
- CIS. 2023a. Cis – relaciones afectivas pospandemia iii. https://www.cis.es/detalle-fichaestudio?origen=estudio&idEstudio=14702.
- CIS. 2023b. Fusión barómetros enero-marzo 2023. https://www.cis.es/es/detalleficha-estudio?idEstudio=14707.
- CIS. 2023c. Opinión pública y política fiscal julio 2023. https://www.cis.es/detalle-fichaestudio?origen=estudio&idEstudio=14741.
- CRS. 2023. Barómetro juventud, salud y bienestar 2023. https://www.centroreinasofia.org/publicacion/barometro-salud-2023/.
- Deng, X., V. Bashlovkina, F. Han, S. Baumgartner, and M. Bendersky. 2023. Llms to the moon? reddit market sentiment analysis with large language...
- Duan, N., D. Tang, P. Chen, and M. Zhou. 2017. Question generation for question answering. In M. Palmer, R. Hwa, and S. Riedel, editors, Proceedings...
- González-Barba, J. Á., L. Chiruzzo, and S. M. Jiménez-Zafra. 2025. Overview of IberLEF 2025: Natural Language Processing Challenges for Spanish...
- Grijalba, J. O., L. A. U. López, J. Camacho-Collados, and E. M. Cámara. 2024. Towards quality benchmarking in question answering over tabular...
- Gururangan, S., S. Swayamdipta, O. Levy, R. Schwartz, S. Bowman, and N. A. Smith. 2018. Annotation artifacts in natural language inference...
- Hui, B., J. Yang, Z. Cui, J. Yang, D. Liu, L. Zhang, T. Liu, J. Zhang, B. Yu, K. Lu, K. Dang, Y. Fan, Y. Zhang, A. Yang, R. Men, F. Huang,...
- Jiang, A. Q., A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de las Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, L....
- Joshi, M., E. Choi, D. S. Weld, and L. Zettlemoyer. 2017. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension....
- Kim, M. J., F. Lefebvre, G. Brison, A. Perez- Lebel, and G. Varoquaux. 2025. Table foundation models: on knowledge pretraining for tabular...
- Kocisky, T., J. Schwarz, P. Blunsom, C. Dyer, K. M. Hermann, G. Melis, and E. Grefenstette. 2018. The narrativeqa reading comprehension challenge....
- Kweon, S., Y. Kwon, S. Cho, Y. Jo, and E. Choi. 2023. Open-WikiTable : Dataset for open domain question answering with complex reasoning over...
- Nan, L., C. Hsieh, Z. Mao, X. V. Lin, N. Verma, R. Zhang, W. Kryscinski, H. Schoelkopf, R. Kong, X. Tang, M. Mutuma, B. Rosand, I. Trindade,...
- OpenAI: A. Hurst [et al.]. 2024. Gpt-4o system card.
- Osés-Grijalba, J., L. A. Ureña-López, E. M. Cámara, and J. Camacho-Collados. 2025a. Overview of PRESTA at Iber-LEF 2025: Question Answering...
- Osés Grijalba, J., L. A. Ureña-López, E. Martínez Cámara, and J. Camacho-Collados. 2025b. Semeval-2025 task 8: Question answering over tabular...
- Osés-Grijalba, J., L. A. Ureña-López, E. M. Cámara, and J. Camacho-Collados. 2024. Question answering over tabular data with databench: A...
- Radford, A., J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al. 2019. Language models are unsupervised multitask learners. Technical...
- Rajpurkar, P., J. Zhang, K. Lopyrev, and P. Liang. 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of...
- Research, M. 2024. Phi-4. https://huggingface.co/microsoft/phi-4. Released by Microsoft, 2024.
- Roziere, B., F. Piquerez, A. Fan, and et al. 2023. Code llama: Open foundation models for code. arXiv preprint arXiv:2308.12950.
- Ushio, A., F. Alva-Manchego, and J. Camacho-Collados. 2022. Generative language models for paragraph-level question generation. In Y. Goldberg,...
- Voorhees, E. M. 2001. The trec question answering track. Natural Language Engineering, 7(4):361–378.
- Wei, J., Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler, E. H. Chi, T. Hashimoto, O. Vinyals,...
- Yang, J., H. Jin, R. Tang, X. Han, Q. Feng, H. Jiang, B. Yin, and X. Hu. 2023. Harnessing the power of llms in practice: A survey on chatgpt...
- Zhang, T., F. Ladhak, E. Durmus, P. Liang, K. McKeown, and T. B. Hashimoto. 2023a. Benchmarking large language models for news summarization....
- Zhang, W., Y. Deng, B. Liu, S. Jialin Pan, and L. Bing. 2023b. Sentiment analysis in the era of large language models: A reality check. arXiv...
- Zhang, X., S. Luo, B. Zhang, Z. Ma, J. Zhang, Y. Li, G. Li, Z. Yao, K. Xu, J. Zhou, D. Zhang-Li, J. Yu, S. Zhao, J. Li, and J. Tang. 2025....