El lenguaje figurado en los LLM: Evaluación y análisis

Marta Ramírez Trives; Elena Lloret Pastor; Alba Bonet Jover

Ayuda

El lenguaje figurado en los LLM: Evaluación y análisis

Autores: Marta Ramírez Trives, Elena Lloret Pastor , Alba Bonet Jover
Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 76, 2026 (Ejemplar dedicado a: Procesamiento del Lenguaje Natural, Revista nº 76, marzo de 2026), págs. 163-176
Idioma: español
Títulos paralelos:
- Figurative Language in LLMs: Evaluation and Analysis
Enlaces
- Texto completo
Resumen
- español
  Pese a los avances en Procesamiento del Lenguaje Natural (PLN), el lenguaje figurado sigue siendo un reto para los grandes modelos de lenguaje (LLM). En este estudio evaluamos y comparamos el rendimiento de los LLM ChatGPT-4o, DeepSeek-V3, Gemini 2.5 Pro Experimental y Grok 3 Preview en tareas de procesamiento del lenguaje figurado. Basándonos en la teoría del lenguaje figurativo convencional (CFLT) de Dobrovol’skij y Piirainen (2021), compilamos CLUE, un conjunto de datos inglés-español de metáforas, modismos, símiles y proverbios. A partir de él, diseñamos y llevamos a cabo pruebas de detección, interpretación y generación de lenguaje figurado. Los modelos ofrecieron resultados excelentes en detección (0,94) e interpretación (0,91) en ambas lenguas. Sin embargo, exceptuando a Gemini, su rendimiento bajó notablemente en la prueba de generación (0,70), especialmente en español. Concluimos que los LLM son eficaces en tareas de comprensión del lenguaje figurado, pero su capacidad para producirlo es más limitada.
- English
  Despite advances in Natural Language Processing (NLP), figurative language remains a challenge for Large Language Models (LLMs). We evaluate and compare the performance of LLMs ChatGPT-4o, DeepSeek-V3, Gemini 2.5 Pro Experimental, and Grok 3 Preview in figurative language processing tasks. Drawing on the Conventional Figurative Language Theory (CFLT) proposed by Dobrovol’skij and Piirainen (2021), we compiled CLUE, an English-Spanish dataset of metaphors, idioms, similes, and proverbs.
  
  Using this dataset, we designed and conducted tests on figurative language detection, interpretation, and generation. The models exhibited strong performance in detection (0.94) and interpretation (0.91) tasks in both languages. However, with the exception of Gemini, their performance dropped significantly in the generation task (0.70), particularly in Spanish. We conclude that LLMs are effective in figurative language comprehension tasks but remain limited in their ability to produce it.
Referencias bibliográficas
- Adewumi, T. P., R. Vadoodi, A. Tripathy, K. Nikolaidou, F. Liwicki, y M. Liwicki. 2022. Potential Idiomatic Expression (PIE)- English: Corpus...
- Ammer, C. 2013. The American Heritage dictionary of idioms. Houghton Mifflin Harcourt, Boston.
- Asociación de Academias de la Lengua Española. s.f. Diccionario de americanismos. Disponible en: https://www.asale.org/damer/.
- Brown, T. B., B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss,...
- Carbonell Basset, D. 1996. A dictionary of proverbs, sayings, saws, adages: English and Spanish. Diccionario de refranes, proverbios, dichos,...
- Castaldo, A. y J. Monti. 2024. Prompting Large Language Models for idiomatic translation. En Proceedings of the First Workshop on Creative-text...
- Chakrabarty, T., Y. Choi, y V. Shwartz. 2022. It’s not rocket science: Interpreting figurative language in narratives. Transactions of the...
- Chakrabarty, T, A. Saakyan, D. Ghosh, y S. Muresan. 2022. FLUTE: Figurative Language Understanding through Textual Explanations. En Proceedings...
- Chiang, W. L., L. Zheng, Y. Sheng, A. N. Angelopoulos, T. Li, D. Li, B. Zhu, H. Zhang, M. L. Jordan, J. E. Gonzalez, y I. Stoica. 2024. Chatbot...
- El Colegio de México, A. C. s.f. Diccionario del español de México. https://dem.colmex.mx/.
- Collins Dictionaries. 2014. Collins English Dictionary – Complete and Unabridged. HarperCollins, Glasgow.
- Comsa, I. M., J. M. Eisenschlos, y S. Narayanan. 2022. MiQA: A benchmark for inference on metaphorical questions. En Proceedings of the 2nd...
- Davies, M. 2021. Corpus of Contemporary American English. Disponible en: https://www.english-corpora.org/coca/.
- Davies, M. s.f. NOW corpus (News on the Web). Disponible en: https://www.englishcorpora. org/now/.
- Dobrovol’skij, D. y E. Piirainen. 2021. Figurative language: Cross-cultural and cross-linguistic perspectives. De Gruyter Mouton, Boston.
- Fornaciari, F. D. L., B. Altuna, I. González Dios, y M. Melero. 2024. A hard nut to crack: Idiom detection with conversational Large Language...
- Gibbs, R. W. y H. L. Colston. 2006. Figurative language. En M. Traxler y M. A. Gernsbacher (Eds.), Handbook of psycholinguistics, páginas...
- Gluck, A., K. von der Wense, y M. L. Pacheco. 2025. CLIX: Cross-Lingual explanations of Idiomatic eXpressions. En Findings of the Association...
- Haagsma, H., J. Bos, y M. Nissim. 2020. MAGPIE: A large corpus of Potentially Idiomatic Expressions. En 12th Language Resources and Evaluation...
- Instituto Cervantes. s.f. Refranero multilingüe. Disponible en: https://cvc.cervantes.es/lengua/refranero/.
- Karmaker, S. K. y D. Feng. 2023. TELeR: A general taxonomy of LLM prompts for benchmarking complex tasks. En Findings of the Association for...
- Kim, G., P. Baldi, y S. McAleer. 2023. Language models can solve computer tasks. Advances in Neural Information Processing Systems, 36:39648-39677.
- Lai, H. y M. Nissim. 2022. Multi-figurative language generation. En Proceedings of the 29th International Conference on Computational Linguistics,...
- Lai, H. y M. Nissim. 2024. A survey on automatic generation of figurative language: From rule-based systems to Large Language Models. ACM...
- Lakoff, G., y M. Johnson. 2003. Metaphors we live by. The University of Chicago Press, Chicago.
- Lexical Computing. 2022. English web corpus 2021 (enTenTen21).
- Lexical Computing. 2023. Spanish web corpus 2023 (esTenTen23).
- Merriam-Webster. s.f. Merriam-Webster.com Dictionary. Disponible en: https://www.merriam-webster.com/.
- Obeidat, M. M., A. S. Haider, S. A. Tair, y Y. Sahari. 2024. Analyzing the performance of Gemini, ChatGPT, and Google Translate in rendering...
- Organización para la Cooperación y el Desarrollo Económico. 2023. AI language models: Technological, socio-economic and policy considerations....
- Phelps, D., T. Pickard, M. Mi, E. Gow-Smith, y A. Villavicencio. 2024. Sign of the times: Evaluating the use of Large Language Models for...
- Pita Fernández, F. 1996. Determinación del tamaño muestral. Cadernos de atención primaria 3(3):138-141.
- Rakshit, G. y J. Flanigan. 2022. FigurativeQA: A test benchmark for figurativeness comprehension for Question Answering. En Proceedings of...
- Real Academia Española. s.f.-a. Corpus de referencia del español actual (CREA). Disponible en: https://corpus.rae.es/creanet.html.
- Real Academia Española. s.f.-b. Corpus del español del siglo XXI (CORPES). Disponible en: https://www.rae.es/corpes/.
- Real Academia Española. s.f.-c. Diccionario de la lengua española. Disponible en: https://dle.rae.es/.
- Roberts, R. M. y R. J. Kreuz. 1994. Why do people use figurative language? Psychological Science, 5(3):159-163.
- Sánchez Bayona, E. y R. Agerri. 2024. Meta4XNLI: A crosslingual parallel corpus for metaphor detection and interpretation. arXiv preprint...
- Seco, M., O. Andrés, y G. Ramos. 2004. Diccionario fraseológico documentado del español actual: locuciones y modismos españoles. Aguilar,...
- Spears, R. A. 2005. McGraw-Hill’s dictionary of American idioms and phrasal verbs. McGraw-Hill, Nueva York.
- Tayyar Madabushi, H., E. Gow-Smith, M. García, C. Scarton, M. Idiart, y A. Villavicencio. 2022. SemEval-2022 task 2: Multilingual idiomaticity...
- Tong, X., R. Choenni, M. Lewis, y E. Shutova. 2024. Metaphor Understanding Challenge dataset for LLMs. En Proceedings of the 62nd Annual Meeting...
- Wachowiak, L. y D. Gromann. 2023. Does GPT-3 grasp metaphors? Identifying metaphor mappings with generative language models. En Proceedings...
- Wei, J., X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V. Le, D. Zhou. 2022. Chain-of-thought prompting elicits reasoning...
- Yao, S., J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, y Y. Cao. 2023. ReAct: Synergizing reasoning and acting in language models. En...