Ir al contenido

Documat


Overview of ADoBo at IberLEF 2025: Automatic Detection of Anglicisms in Spanish

  • Autores: Elena Álvarez Mellado, Jordi Porta Zamorano Árbol académico, Constantine Lignos, Julio Gonzalo Arroyo Árbol académico
  • Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 75, 2025 (Ejemplar dedicado a: Procesamiento del Lenguaje Natural, Revista nº 75, septiembre de 2025), págs. 373-383
  • Idioma: inglés
  • Títulos paralelos:
    • Resumen de ADoBo 2025: detección automática de anglicismos en español
  • Enlaces
  • Resumen
    • español

      En este articulo presentamos los resultados de ADoBo 2025, la tarea compartida de IberLEF 2025 sobre detección automática de anglicismos en castellano. La tarea consistió en identificar anglicismos contenidos en una colección de frases en castellano de estilo periodístico. Cinco equipos participaron en la fase de test y propusieron sistemas de diversa naturaleza (LLMs, deep learning, sistemas basados en reglas, sistemas basados en Transformers) con resultados que oscilan entre los 0.17 y los 0.99 puntos de valor F1, lo que ilustra la variabilidad de resultados que distintos sistemas pueden obtener para esta tarea.

    • English

      This paper summarizes the main findings of ADoBo 2025, the shared task on anglicism identification in Spanish proposed in the context of IberLEF 2025. Participants of ADoBo 2025 were asked to detect English lexical borrowings (or anglicisms) from a collection of Spanish journalistic texts. Five teams submitted their solutions for the test phase. Proposed systems included LLMs, deep learning models, Transformer-based models and rule-based systems. The results range from F1 scores of 0.17 to 0.99, which showcases the variability in performance different systems can have for this task.

  • Referencias bibliográficas
    • Aguilar, G., S. Kar, and T. Solorio. 2020. LinCE: A centralized benchmark for linguistic code-switching evaluation. In N. Calzolari, F. Béchet,...
    • AI@Meta. 2024. Llama 3 model card.
    • Alex, B. 2008. Automatic detection of English inclusions in mixed-lingual data with an application to parsing. Ph.D. thesis, University of...
    • Álvarez Mellado, E. 2020. Lázaro: An extractor of emergent anglicisms in Spanish newswire. Master’s thesis, Brandeis University.
    • Álvarez Mellado, E. 2025. Lexical borrowing detection as a sequence labeling task. Data, modeling and evaluation methods for anglicism retrieval...
    • Álvarez Mellado, E., L. Espinosa Anke, J. Gonzalo, C. Lignos, and J. Porta Zamorano. 2021. Overview of ADoBo 2021: Automatic detection of...
    • Álvarez-Mellado, E. and C. Lignos. 2022. Detecting unassimilated borrowings in Spanish: An annotated corpus and approaches to modeling. In...
    • Andersen, G. 2012. Semi-automatic approaches to anglicism detection in Norwegian corpus data. In C. Furiassi, V. Pulcini, and F. Rodríguez...
    • Cañete, J., G. Chaperon, R. Fuentes, J.-H. Ho, H. Kang, and J. Pérez. 2020. Spanish pre-trained bert model and evaluation data. In PML4DC...
    • Chesley, P. 2010. Lexical borrowings in French: Anglicisms as a separate phenomenon. Journal of French Language Studies, 20(3):231–251.
    • Chesley, P. and R. H. Baayen. 2010. Predicting new words from newer words: Lexical borrowings in French. Linguistics, 48(6):1343.
    • Chinchor, N. and B. Sundheim. 1993. MUC- 5 Evaluation Metrics. In Fifth Message Understanding Conference (MUC-5): Proceedings of a Conference...
    • Chiruzzo, L., M. Agüero-Torales, G. Giménez-Lugo, A. Alvarez, Y. Rodríguez, S. Góngora, and T. Solorio. 2023. Overview of GUA-SPA at IberLEF...
    • de la Rosa, J. 2021. The futility of STILTs for the classification of lexical borrowings in Spanish. In Proceedings of the Iberian Languages...
    • Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding....
    • Furiassi, C. and K. Hofland. 2007. The retrieval of false anglicisms in newspaper texts. In Corpus Linguistics 25 Years On. Brill Rodopi,...
    • Furiassi, C., V. Pulcini, and F. R. Gonzalez. 2012. The anglicization of European lexis. John Benjamins Publishing.
    • Garley, M. and J. Hockenmaier. 2012. Beef-moves: Dissemination, diversity, and dynamics of English borrowings in a German hip hop forum. In...
    • Gerding, C., M. Fuentes, L. Gómez, and G. Kotz. 2014. Anglicism: An active word-formation mechanism in Spanish. Colombian Applied Linguistics...
    • González-Barba, J. ´ A., L. Chiruzzo, and S. M. Jiménez-Zafra. 2025. Overview of IberLEF 2025: Natural Language Processing Challenges for...
    • Grattafiori, A., A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughan, A. Yang, A. Fan, A....
    • Hammond, M. 2025. Loanword detection with maximally simple tools. In J. Á. González-Barba, L. Chiruzzo, and S. M. Jiménez-Zafra, editors,...
    • Haugen, E. 1950. The analysis of linguistic borrowing. Language, 26(2):210–231.
    • Heredia, M., J. Barnes, and A. Soroa. 2025. HiTZ at ADoBo 2025: Few-Shot Anglicism Detection in Spanish. In J. ´A. González-Barba, L. Chiruzzo,...
    • Honnibal, M. and I. Montani. 2017. spaCy 2: Natural language understanding with bloom embeddings, convolutional neural networks and incremental...
    • Jiang, S., T. Cui, Y. Fu, N. Lin, and J. Xiang. 2021. BERT4EVER at ADoBo 2021: Detection of Borrowings in the Spanish Language Using Pseudo-label...
    • Leidig, S., T. Schlippe, and T. Schultz. 2014. Automatic detection of anglicisms for the pronunciation dictionary generation: a case study...
    • Losnegaard, G. S. and G. I. Lyse. 2012. A data-driven approach to anglicism identification in Norwegian. In G. Andersen, editor, Exploring...
    • Lyman, A. 2025. LBAD: Demonstrating the Effectiveness of Commercial Large Language Models for Anglicism Detection. In J. Á. González-Barba,...
    • Madrid, J., P. Martínez, and L. Moreno. 2025. HULAT-UC3M @ ADoBo 2025: A RoBERTa-based Pipeline for Anglicisms Detection in Spanish Texts....
    • Mansikkaniemi, A. and M. Kurimo. 2012. Unsupervised vocabulary adaptation for morph-based language models. In Proceedings of the NAACL-HLT...
    • Mi, C., L. Xie, and Y. Zhang. 2020. Loanword identification in low-resource languages with minimal supervision. ACM Transactions on Asian...
    • Mi, C. and S. Zhu. 2025. Multi-source knowledge fusion for multilingual loanword identification. Expert Systems with Applications, page 126588.
    • Moreno Fernández, F. and A. Moreno Sandoval. 2018. Configuración lingüística de anglicismos procedentes de Twitter en el español estadounidense....
    • Nath, A., S. Mahdipour Saravani, I. Khebour, S. Mannan, Z. Li, and N. Krishnaswamy. 2022. A Generalized Method for Automated Multilingual...
    • Onysko, A. 2007. Anglicisms in German: Borrowing, lexical productivity, and written codeswitching, volume 23. Walter de Gruyter.
    • Poplack, S., D. Sankoff, and C. Miller. 1988. The social correlates and linguistic processes of lexical borrowing and assimilation. Linguistics,...
    • Real Academia Española. 2024. Diccionario de la lengua española, ed. 23.8.
    • Rodríguez González, F. 2002. Spanish. In M. Görlach, editor, English in Europe. Oxford University Press, chapter 7, pages 128–150.
    • Serigos, J. R. L. 2017a. Applying corpus and computational methods to loanword research: new approaches to Anglicisms in Spanish. Ph.D. thesis,...
    • Serigos, J. R. L. 2017b. Using distributional semantics in loanword research: A concept-based approach to quantifying semantic specificity...
    • Sánchez-León, F. 2025. A Naive Hybrid Approach to Borrowing Detection. In J. Á. González-Barba, L. Chiruzzo, and S. M. Jiménez-Zafra, editors,...
    • Tsvetkov, Y., W. Ammar, and C. Dyer. 2015. Constraint-Based Models of Lexical Borrowing. In Proceedings of the 2015 Conference of the North...
    • Tsvetkov, Y. and C. Dyer. 2016. Cross-Lingual Bridges with Models of Lexical Borrowing. Journal of Artificial Intelligence Research, 55:63–93,...
    • Weinreich, U. 1963. Languages in contact (1953). The Hague: Mouton.

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno