Overview of ADoBo 2021:: Automatic Detection of Unassimilated Borrowings in the Spanish Press

Constantine Lignos; Jordi Porta Ribalta; Elena Álvarez Mellado; Luis Espinosa Anke; Julio Gonzalo Arroyo

Ayuda

Overview of ADoBo 2021:: Automatic Detection of Unassimilated Borrowings in the Spanish Press

Autores: Constantine Lignos, Jordi Porta Ribalta, Elena Álvarez Mellado, Luis Espinosa Anke, Julio Gonzalo Arroyo
Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 67, 2021, págs. 277-285
Idioma: inglés
Títulos paralelos:
- Resumen de ADoBo 2021:: detección automática de préstamos léxicos no asimilados en la prensa española
Enlaces
- Texto completo

Dialnet Métricas: 1 Cita

Resumen
- español
  En este artículo presentamos los resultados de ADoBo 2021, la tarea compartida de IberLEF 2021 sobre detección de préstamos léxicos en la prensa española. En esta tarea abordamos la detección de préstamos como un problema de etiquetado de secuencias. A los participantes de la tarea se les proporcionó un corpus de prensa española anotado con préstamos léxicos no asimilados (mayoritariamente anglicismos) siguiendo el esquema BIO. Recibimos nueve sistemas distintos provenientes de cuatro equipos diferentes. Los resultados obtenidos oscilan entre los 37 y los 85 puntos de valor F1, lo que indica que la detección de préstamos léxicos es un problema no resuelto (sobre todo cuando se abordan préstamos no vistos anteriormente) y que el trabajo lexicográfico tradicional podría beneficiarse de incorporar las técnicas actuales del PLN.
- English
  This paper summarizes the main findings of the ADoBo 2021 shared task, proposed in the context of IberLef 2021. In this task, we invited participants to detect lexical borrowings (coming mostly from English) in Spanish newswire texts. This task was framed as a sequence classification problem using BIO encoding. We provided participants with an annotated corpus of lexical borrowings which we split into training, development and test splits. We received submissions from 4 teams with 9 different system runs overall. The results, which range from F1 scores of 37 to 85, suggest that this is a challenging task, especially when out-of-domain or OOV words are considered, and that traditional methods informed with lexicographic information would benefit from taking advantage of current NLP trends.
Referencias bibliográficas
- Aguilar, G., F. AlGhamdi, V. Soto, M. Diab, J. Hirschberg, and T. Solorio. 2018. Named entity recognition on code switched data: Overview...
- Alex, B. 2008a. Automatic detection of English inclusions in mixed-lingual data with an application to parsing. Ph.D. thesis, University of...
- Alex, B. 2008b. Comparing corpus-based to web-based lookup techniques for automatic English inclusion detection. In Proceedings of the Sixth...
- Alvarez-Mellado, E. 2020. Lázaro: An extractor of emergent anglicisms in Spanish newswire. Master’s thesis, Brandeis University.
- Andersen, G. 2012. Semi-automatic approaches to anglicism detection in Norwegian corpus data. In C. Furiassi, V. Pulcini, and F. Rodríguez...
- Chesley, P. 2010. Lexical borrowings in French: Anglicisms as a separate phenomenon. Journal of French Language Studies, 20(3):231–251.
- Chesley, P. and R. H. Baayen. 2010. Predicting new words from newer words: Lexical borrowings in French. Linguistics, 48(6):1343.
- Clyne, M., M. G. Clyne, and C. Michael. 2003. Dynamics of language contact: English and immigrant languages. Cambridge University Press.
- Furiassi, C. and K. Hofland. 2007. The retrieval of false anglicisms in newspaper texts. In Corpus Linguistics 25 Years On. Brill Rodopi,...
- Furiassi, C., V. Pulcini, and F. R. González. 2012. The anglicization of European lexis. John Benjamins Publishing.
- Garley, M. and J. Hockenmaier. 2012. Beefmoves: Dissemination, diversity, and dynamics of English borrowings in a German hip hop forum. In...
- Gerding, C., M. Fuentes, L. Gómez, and G. Kotz. 2014. Anglicism: An active word-formation mechanism in Spanish. Colombian Applied Linguistics...
- Gómez Capuz, J. 1997. Towards a typological classification of linguistic borrowing (illustrated with anglicisms in romance languages). Revista...
- Haspelmath, M. and U. Tadmor. 2009. Loanwords in the world’s languages: a comparative handbook. Walter de Gruyter.
- Haugen, E. 1950. The analysis of linguistic borrowing. Language, 26(2):210–231.
- Leidig, S., T. Schlippe, and T. Schultz. 2014. Automatic detection of anglicisms for the pronunciation dictionary generation: a case study...
- Lorenzo, E. 1996. Anglicismos hispánicos. Biblioteca románica hispánica: Estudios y ensayos. Gredos.
- Losnegaard, G. S. and G. I. Lyse. 2012. A data-driven approach to anglicism identification in Norwegian. In G. Andersen, editor, Exploring...
- Mansikkaniemi, A. and M. Kurimo. 2012. Unsupervised vocabulary adaptation for morph-based language models. In Proceedings of the NAACL-HLT...
- Matras, Y. and J. Sakel. 2007. Grammatical borrowing in cross-linguistic perspective, volume 38. Walter de Gruyter. Molina, G., F. AlGhamdi,...
- Núñez Nogueroles, E. E. 2018. A comprehensive definition and typology of anglicisms in present-day Spanish. Epos: Revista de filología, (34):211–237.
- Onysko, A. 2007. Anglicisms in German: Borrowing, lexical productivity, and written codeswitching, volume 23. Walter de Gruyter.
- Palen-Michel, C., N. Holley, and C. Lignos. 2021. SeqScore. https://github.com/bltlab/seqscore.
- Phang, J., T. Févry, and S. R. Bowman. 2019. Sentence encoders on stilts: Supplementary training on intermediate labeleddata tasks. arXiv...
- Poplack, S., D. Sankoff, and C. Miller. 1988. The social correlates and linguistic processes of lexical borrowing and assimilation. Linguistics,...
- Pratt, C. 1980. El anglicismo en el español peninsular contemporáneo, volume 308. Gredos.
- Rodríguez González, F. 1999. Anglicisms in contemporary Spanish. An overview. Atlantis, 21(1/2):103–139.
- Serigos, J. R. L. 2017. Applying corpus and computational methods to loanword research: new approaches to Anglicisms in Spanish. Ph.D. thesis,...
- Solorio, T., E. Blair, S. Maharjan, S. Bethard, M. Diab, M. Ghoneim, A. Hawwari, F. AlGhamdi, J. Hirschberg, A. Chang, and P. Fung. 2014....
- Thomason, S. G. and T. Kaufman. 1992. Language contact, creolization, and genetic linguistics. Univ of California Press.
- Tsvetkov, Y. and C. Dyer. 2016. Crosslingual bridges with models of lexical borrowing. Journal of Artificial Intelligence Research, 55:63–93.
- Weinreich, U. 1963. Languages in contact (1953). The Hague: Mouton.
- Rodríguez González, F. 2002. Spanish. In M. Görlach, editor, English in Europe. Oxford University Press, chapter 7, pages 128–150.