Ajuste de modelos BART para simplificación de textos sobre salud en español

Paloma Martínez Fernández; Lourdes Moreno López; Rodrigo Alarcon

Ayuda

Ajuste de modelos BART para simplificación de textos sobre salud en español

Autores: Paloma Martínez Fernández , Lourdes Moreno López , Rodrigo Alarcon
Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 70, 2023, págs. 111-122
Idioma: español
Títulos paralelos:
- Tuning BART models to simplify Spanish health-related content
Enlaces
- Texto completo

Dialnet Métricas: 1 Cita

Resumen
- español
  La alfabetización sanitaria se ha convertido en una habilidad cada vez más importante para que los ciudadanos tomen decisiones sobre su salud en las sociedades modernas. Para ayudar a las personas a comprender la información sobre su estado de salud, es necesaria una tecnología que facilite la accesibilidad de los textos. Este artículo presenta un enfoque de transfer learning implementado con BART (Bidirectional AutoRegressive Transformers), una técnica sequence-to-sequence que se entrena como un autoencoder de eliminación de ruido. Para llevar a cabo esta tarea, se han ajustado modelos preentrenados para simplificar textos en español. Dado que el ajuste de los modelos lingüísticos requiere datos de muestra para adaptarlos a una nueva tarea, en este artículo también se presenta el proceso de creación de un conjunto de datos paralelos sintéticos de textos en español relacionados con la salud. Los resultados en el conjunto de prueba de los modelos afinados alcanzaron valores SARI de 59,7 en un modelo multilingual BART (mBART) y 29,74 en un modelo mBART pre-entrenado para la tarea de generación de resúmenes en español. Además lograron mejorar la legibilidad de los textos originales según la escala de Inflesz.
- English
  Health literacy has become an increasingly important skill for citizens to make health-relevant decisions in modern societies. Technology to support text accessibility is needed to help people understand information about their health conditions. This paper presents a transfer learning approach implemented with BART (Bidirectional AutoRegressive Transformers), a sequence-to-sequence technique that is trained as a denoising autoencoder. To accomplish this task, pre-trained models have been fine-tuned to simplify Spanish texts. Since fine tuning of language models requires sample data to adapt it to a new task, the process of creating of a synthetic parallel dataset of Spanish health-related texts is also introduced in this paper. The results on the test set of the fine-tuned models reached SARI values of 59.7 in a multilingual BART (mBART) model and 29.74 in a pre-trained mBART model for the Spanish summary generation task. They also achieved improved readability of the original texts according to the Inflesz scale.
Referencias bibliográficas
- Al-Thanyyan, S. S. and A. M. Azmi. 2021. Automated text simplification: A survey. ACM Computing Surveys (CSUR), 54(2):1–36.
- Alarcon, R. 2021. Dataset of sentences annotated with complex words and their synonyms to support lexical simplification, March.
- Alarcon, R., L. Moreno, and P. Martınez. 2021. Lexical simplification system to improve web accessibility. IEEE Access, 9:58755–58767.
- Alarcon Garcıa, R. 2022. Lexical simplification for the systematic support of cognitive accessibility guidelines. https://doi. org/10.1145/3471391.3471400.
- Barbu, E., M. T. Martın-Valdivia, E. Martınez-Camara, and L. A. Urena- Lopez. 2015. Language technologies applied to document simplification...
- Barrio-Cantalejo, I. M., P. Simon-Lorda, M. Melguizo, I. Escalona, M. I. Marijuan, and P. Hernando. 2008. Validacion de la escala inflesz...
- Campillos Llanos, L., A. R. Terroba Reinares, S. Zakhir Puig, A. Valverde, and A. Capllonch-Carrion. 2022. Building a comparable corpus and a...
- Chamovitz, E. and O. Abend. 2022. Cognitive simplification operations improve text simplification.
- Cumbicus-Pineda, O. M., I. Gutierrez- Fandi˜no, I. Gonzalez-Dios, and A. Soroa. 2022. Noisy channel for automatic text simplification.
- Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding.
- Ermakova, L., I. Ovchinnikov, J. Kamps, D. Nurbakova, S. Araujo, and R. Hannachi. 2022a. Overview of the clef 2022 simpletext task 2: Complexity...
- Ermakova, L., I. Ovchinnikov, J. Kamps, D. Nurbakova, S. Araujo, and R. Hannachi. 2022b. Overview of the clef 2022 simpletext task 3: Query...
- Ferres, D. and H. Saggion. 2022a. Alexsis: a dataset for lexical simplification in spanish. In Proceedings of the Thirteenth Language Resources...
- Ferres, D. and H. Saggion. 2022b. ALEXSIS: A dataset for lexical simplification in Spanish. In Proceedings of the Thirteenth Language Resources...
- Huang, J. and J. Mao. 2022. Assembly models for simpletext task 2: Results from wuhan university research group.
- Kauchak, D., J. Apricio, and G. Leroy. 2022. Improving the quality of suggestions for medical text simplification tools. In AMIA Annual Symposium...
- Lewis, M., Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer. 2019. Bart: Denoising sequence-to-sequence...
- Lin, C.-Y. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74– 81, Barcelona, Spain,...
- Marimon, M., J. Vivaldi, and N. Bel Rafecas. 2017. Annotation of negation in the iula spanish clinical record corpus. Blanco E, Morante R,...
- 2017. Computational Semantics Beyond Events and Roles; 2017 Apr 4; Valencia, Spain. Stroudsburg (PA): ACL; 2017. p. 43-52.
- Martin, L., A. Fan, E. de la Clergerie, A. Bordes, and B. Sagot. 2020. Muss: multilingual unsupervised sentence simplification by mining paraphrases....
- McCarthy, D. and R. Navigli. 2007. Semeval-2007 task 10: English lexical substitution task. In Proceedings of the fourth international workshop...
- Menta, A. and A. Garcia-Serrano. 2022. Controllable sentence simplification using transfer learning. Proceedings of the Working Notes of CLEF.
- Monteiro, J., M. Aguiar, and S. Araujo. 2022. Using a pre-trained simplet5 model for text simplification in a limited corpus. Proceedings...
- Moreno, L., R. Alarcon, and P. Martınez. 2020. Easier system. language resources for cognitive accessibility. In The 22nd International ACM...
- Mostert, F., A. Sampatsing, M. Spronk, and J. Kamps. 2022. University of amsterdam at the clef 2022 simpletext track. Proceedings of the Working...
- Paetzold, G. and L. Specia. 2016a. Semeval 2016 task 11: Complex word identification. In Proceedings of the 10th International Workshop on...
- Paetzold, G. and L. Specia. 2016b. Unsupervised lexical simplification for non-native speakers. In Proceedings of the AAAI Conference on Artificial...
- Plainlanguage. 2017a. Plain english- free guides (co.uk).
- Plainlanguage. 2017b. Plain language action and information network (plain).
- Radford, A., K. Narasimhan, T. Salimans, I. Sutskever, et al. 2018. Improving language understanding by generative pretraining.
- Rubio, A. and P. Martınez. 2022. Hulatuc3m at simpletext@ clef-2022: Scientific text simplification using bart. Proceedings of the Working...
- Saggion, H., S. Stajner, S. Bott, S. Mille, L. Rello, and B. Drndarevic. 2015. Making it simplext: Implementation and evaluation of a text...
- Saggion, H., S. Stajner, D. Ferres, K. C. Sheang, M. Shardlow, K. North, and M. Zampieri. 2023. Findings of the tsar-2022 shared task on multilingual lexical...
- Segura-Bedmar, I. and P. Martınez. 2017. Simplifying drug package leaflets written in spanish by using word embedding. Journal of biomedical...
- Shardlow, M., R. Evans, G. H. Paetzold, and M. Zampieri. 2021. SemEval-2021 task 1: Lexical complexity prediction. In Proceedings of the 15th...
- Smith, L. N. 2018. A disciplined approach to neural network hyper-parameters: Part 1–learning rate, batch size, momentum, and weight decay....
- Talec-Bernard, T. 2022. Is using an ai to simplify a scientific text really worth it. Proceedings of the Working Notes of CLEF.
- Tang, Y., C. Tran, X. Li, P.-J. Chen, N. Goyal, V. Chaudhary, J. Gu, and A. Fan. 2020. Multilingual translation with extensible multilingual...
- Truica, C.-O., A.-I. Stan, and E.-S. Apostol. 2022. Simplex: a lexical text simplification architecture. Neural Computing and Applications,...
- UNE. 2018. Une 153101:2018 ex easy to read. guidelines and recommendations for the elaboration of documents.
- Wilkens, R., B. Oberle, and A. Todirascu. 2020. Coreference-based text simplification. In Proceedings of the 1st Workshop on Tools and Resources...
- Xu, W., C. Napoles, E. Pavlick, Q. Chen, and C. Callison-Burch. 2016. Optimizing statistical machine translation for text simplification. Transactions...
- Yimam, S. M., C. Biemann, S. Malmasi, G. H. Paetzold, L. Specia, S. Stajner, A. Tack, and M. Zampieri. 2018. A report on the complex word...
- Yimam, S. M., S. Stajner, M. Riedl, and C. Biemann. 2017. Multilingual and cross-lingual complex word identification. In RANLP, pages 813–822.
- Zhu, Z., D. Bernhard, and I. Gurevych. 2010. A monolingual tree-based translation model for sentence simplification. In Proceedings of the...
- Stajner, S., K. C. Sheang, and H. Saggion. 2022. Sentence simplification capabilities of transfer-based models. Proceedings of the AAAI Conference...