Ir al contenido

Documat


Characterizing Spans for Sequence Labeling: A Case on Anglicism Detection

  • Autores: Elena Álvarez Mellado, Julio Gonzalo Arroyo Árbol académico
  • Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 73, 2024, págs. 235-246
  • Idioma: inglés
  • Títulos paralelos:
    • Caracterización de spans en tareas de etiquetado de secuencias: el caso de la detección de anglicismos
  • Enlaces
  • Resumen
    • español

      Presentamos un conjunto de dimensiones para caracterizar spans en la evaluación de etiquetado de secuencias y las aplicamos a la tarea de detección de anglicismos en castellano. Los resultados muestran que las dimensiones ayudan a desenmascarar limitaciones que pasaron desapercibidas en la evaluación estándar.

    • English

      We propose a set of formal dimensions to characterize spans in sequence labeling evaluation. We apply them to a dataset and model results obtained for anglicism detection in Spanish. Results show that the best performing system is outperformed by other models on certain types of spans. Our methodology can uncover limitations in performance that go unnoticed with standard evaluation.

  • Referencias bibliográficas
    • Alex, B. 2008. Automatic detection of English inclusions in mixed-lingual data with an application to parsing. PhD Thesis, University of Edinburgh.
    • Alvarez Mellado, E. 2020. Lázaro: An extractor of emergent anglicisms in spanish newswire. Master’s thesis, Brandeis University.
    • Alvarez Mellado, E., L. Espinosa Anke, J. Gonzalo Arroyo, C. Lignos, and J. Porta Zamorano. 2021. Overview of ADoBo 2021: Automatic Detection...
    • Alvarez-Mellado, E. and C. Lignos. 2022. Detecting Unassimilated Borrowings in Spanish: An Annotated Corpus and Approaches to Modeling. In...
    • Andersen, G. 2012. Semi-automatic approaches to Anglicism detection in Norwegian corpus data. In C. Furiassi, V. Pulcini, and F. Rodr´ıguez...
    • Bender, E. M. and B. Friedman. 2018. Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science....
    • Bernier-Colborne, G. and P. Langlais. 2020. HardEval: Focusing on Challenging Tokens to Assess Robustness of NER. In Proceedings of the Twelfth...
    • Chiruzzo, L., M. Agüero-Torales, G. Giménez-Lugo, A. Alvarez, Y. Rodríguez, S. Góngora, and T. Solorio. 2023. Overview of GUASPA at IberLEF...
    • Fu, J., P. Liu, and G. Neubig. 2020. Interpretable Multi-dataset Evaluation for Named Entity Recognition. In Proceedings of the 2020 Conference...
    • Furiassi, C. and K. Hofland. 2007. The retrieval of false anglicisms in newspaper texts. In Corpus Linguistics 25 Years On. Brill Rodopi,...
    • Gorman, K. and S. Bedrick. 2019. We Need to Talk about Standard Splits. In Proceedings of the 57th Annual Meeting of the Association for Computational...
    • Lin, B. Y., W. Gao, J. Yan, R. Moreno, and X. Ren. 2021. RockNER: A Simple Method to Create Adversarial Examples for Evaluating the Robustness...
    • Losnegaard, G. S. and G. I. Lyse. 2012. A data-driven approach to anglicism identification in Norwegian. In G. Andersen, editor, Exploring...
    • Papay, S., R. Klinger, and S. Padó. 2020. Dissecting Span Identification Tasks with Performance Prediction. In B. Webber, T. Cohn, Y. He,...
    • Real Academia Española. 2011. Ortografia de la Lengua Española. Planeta Publishing, April.
    • Serigos, J. R. L. 2017. Applying corpus and computational methods to loanword research : new approaches to Anglicisms in Spanish. August.
    • Søgaard, A., S. Ebert, J. Bastings, and K. Filippova. 2021. We Need To Talk About Random Splits. In Proceedings of the 16th Conference of...
    • Tsvetkov, Y. and C. Dyer. 2016. Crosslingual bridges with models of lexical borrowing. Journal of Artificial Intelligence Research, 55:63–93.
    • Tu, J. and C. Lignos. 2021. TMR: Evaluating NER Recall on Tough Mentions. In Proceedings of the 16th Conference of the European Chapter of...
    • Vajjala, S. and R. Balasubramaniam. 2022. What do we really know about State of the Art NER? In Proceedings of the Thirteenth Language Resources...
    • Zhou, L., P. A. Moreno-Casares, F. Martínez-Plumed, J. Burden, R. Burnell, L. Cheke, C. Ferri, A. Marcoci, B. Mehrbakhsh, Y. Moros-Daval,...

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno