Ir al contenido

Documat


LegalEc: Un nuevo corpus para la investigación de la identificación de palabras complejas en los estudios de Derecho en español ecuatoriano

  • Autores: César Espin Riofrio, Arturo Montejo Ráez Árbol académico, Jenny Alexandra Ortiz Zambrano
  • Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 71, 2023, págs. 247-259
  • Idioma: varios idiomas
  • Títulos paralelos:
    • LegalEc: A New Corpus for Complex Word Identification Research in Law Studies in Ecuatorian Spanish
  • Enlaces
  • Resumen
    • Multiple

      En este trabajo, presentamos a LegalEc, un nuevo corpus etiquetado con léxico complejo construido con textos de contenido legal en español ecuatoriano. Detallamos el proceso de compilación y anotación del mismo. Para proporcionar casos base a la comunidad científica, se han realizado varios experimentos de predicción de palabras complejas sobre este corpus. Extrajimos 23 características lingüísticas que combinamos con las codificaciones generadas por modelos como XLM-RoBERTa y RoBERTa-BNE (del proyecto MarIA). La evaluación muestra que la combinación de estas características mejora notablemente la predicción de la complejidad léxica.

    • English

      In this paper, we present LegalEc, a new annotated corpus of complex lexis constructed from legal texts in Ecuadorian Spanish. We detail its compilation and annotation process. In order to provide a resource for the scientific community to continue research in the area of Lexical Simplification in the Spanish language, several complex word prediction experiments have been carried out on this corpus. We extracted 23 linguistic features which we combined with the encodings generated by models such as XLM-RoBERTa and RoBERTa-BNE (from the MarIA project). The evaluation shows that the combination of these features improves the prediction of lexical complexity.

  • Referencias bibliográficas
    • Alarcon, R., L. Moreno, and P. Martınez. 2020. Hulat-alexs cwi task-cwi for language and learning disabilities applied to university educational...
    • Anula, A. 2008. Lecturas adaptadas a la enseñanza del español como l2: variables ling¨uısticas para la determinacion del nivel de legibilidad....
    • Cabrera-Melendez, J. L., D. Iparraguirre- Leon, M. Way, F. Valenzuela-Ore, and D. B. Montesinos-Tubee. 2022. The applicability of similarity...
    • Camposa, R. A., P. Estrella, J. A. Castillo, and W. A. Grijalba. 2020. Estudio de la complejidad del español para la simplificaci on textual....
    • Crossley, S. A., T. Salsbury, and D. S. Mc- Namara. 2012. Predicting the proficiency level of language learners using lexical indices. Language...
    • Davidson, S., A. Yamada, P. F. Mira, A. Carando, C. H. S. Gutierrez, and K. Sagae. 2020. Developing nlp tools with a new corpus of learner...
    • Desai, A. T., K. North, M. Zampieri, and C. Homan. 2021. LCP-RIT at SemEval2021 task 1: Exploring linguistic features for lexical complexity...
    • Doring, M. 2021. How-to bureaucracy: A concept of citizens’ administrative literacy. Administration & Society, 53(8):1155–1177.
    • Garcıa-Dıaz, J. A., A. Almela, G. Alcaraz- Marmol, and R. Valencia-Garcıa. 2020. Umucorpusclassifier: Compilation and evaluation of linguistic...
    • Mosquera, A. 2021. Alejandro mosquera at semeval-2021 task 1: Exploring sentence and word features for lexical complexity prediction. In Proceedings...
    • North, K., M. Zampieri, and M. Shardlow. 2022. An evaluation of binary comparative lexical complexity models. In Proceedings of the 17th Workshop...
    • North, K., M. Zampieri, and M. Shardlow. 2023. Lexical complexity prediction: An overview. ACM Computing Surveys, 55(9):1–42.
    • Ortiz-Zambrano, J. and A. Montejo-Raez. 2017. Vytedu: Un corpus de vıdeos y sus transcripciones para investigacion en el ambito educativo.
    • Ortiz-Zambrano, J. and A. Montejo-Raez. 2020. Overview of alexs 2020: First workshop on lexical analysis at sepln. In Proceedings of the Iberian...
    • Ortiz-Zambrano, J. and A. Montejo-Raez. 2021. Clexis2: A new corpus for complex word identification research in computing studies. In Proceedings...
    • Ortiz-Zambrano, J. and E. Varela Tapia. 2019. Reading comprehension in university texts: the metrics of lexical complexity in corpus analysis...
    • Paetzold, G. 2021. Utfpr at semeval- 2021 task 1: Complexity prediction by combining bert vectors and classic features. In Proceedings of...
    • Paetzold, G. and L. Specia. 2016a. Semeval 2016 task 11: Complex word identification. pages 560–569, 01.
    • Paetzold, G. and L. Specia. 2016b. Sv000gg at semeval-2016 task 11: Heavy gauge complex word identification with system voting. In Proceedings...
    • Pitkowski, E. F. and J. V. Gamarra. 2009. El uso de los corpus ling¨uısticos como herramienta pedagogica para la enseñanza y aprendizaje de...
    • Quevedo-Marcos, B. 2020. Analisis de las herrramientas de procesamiento de lenguaje natural para estructurar textos medicos.
    • Rico-Sulayes, A. 2020. General lexiconbased complex word identification extended with stem n-grams and morphological engines. In Proceedings...
    • Ronzano, F., L. E. Anke, H. Saggion, et al. 2016. Taln at semeval-2016 task 11: Modelling complex words by contextual, lexical and semantic...
    • Saggion, H., S. Stajner, S. Bott, S. Mille, L. Rello, and B. Drndarevic. 2015. Making it simplext: Implementation and evaluation of a text...
    • Saggion, H., S. Stajner, D. Ferres, K. C. Sheang, M. Shardlow, K. North, and M. Zampieri. 2022. Findings of the tsar-2022 shared task on multilingual...
    • Saggion, H., S. Stajner, D. Ferres, K. C. Sheang, M. Shardlow, K. North, and M. Zampieri. 2023. Findings of the tsar-2022 shared task on multilingual lexical...
    • Segura-Bedmar, I. and P. Martınez. 2017. Simplifying drug package leaflets written in spanish by using word embedding. Journal of biomedical...
    • Shardlow, M. 2013. A comparison of techniques to automatically identify complex words. In 51st annual meeting of the association for computational...
    • Shardlow, M., M. Cooper, and M. Zampieri. 2020. CompLex — a new corpus for lexical complexity prediction from Likert Scale data. In Proceedings...
    • Shardlow, M., R. Evans, G. H. Paetzold, and M. Zampieri. 2021. SemEval-2021 task 1: Lexical complexity prediction. In Proceedings of the 15th...
    • Shiroyama, T. 2022. Comparing lexical complexity using two different ve modes: a pilot study. Intelligent CALL, granular systems and learner...
    • Spaulding, S. 1956. A spanish readability formula. The Modern Language Journal, 40(8):433–441.
    • Taya, Y., L. Kanashiro Pereira, F. Cheng, and I. Kobayashi. 2021. OCHADAIKYOTO at SemEval-2021 task 1: Enhancing model generalization and...
    • Yimam, S. M., C. Biemann, S. Malmasi, G. Paetzold, L. Specia, S. Stajner, A. Tack, and M. Zampieri. 2018. A report on the complex word identification shared...
    • Zotova, E., M. Cuadros, N. Perez, and A. G. Pablos. 2020. Vicomtech at alexs 2020: Unsupervised complex word identification based on domain...

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno