Ir al contenido

Documat


Crosslingual Argument Mining in the Medical Domain

  • Autores: Anar Yeginbergenova, Rodrigo Agerri Gascón Árbol académico
  • Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 73, 2024, págs. 296-312
  • Idioma: inglés
  • Títulos paralelos:
    • Minería de Argumentos Crosslingüe en el Dominio Médico
  • Enlaces
  • Resumen
    • español

      La tecnología basada en Inteligencia Artificial tiene una gran potencialidad para desarrollar asistentes que ayuden a profesionales médicos en la toma de decisiones, las cuales en muchos casos están basadas en el procesamiento de una gran cantidad de textos no-estructurados. En este contexto, la minería de argumentos (AM) puede ayudar a estructurar los datos textuales en componentes argumentativos y las relaciones discursivas existentes entre ellos. Sin embargo, al igual que todavía ocurre en muchas tareas de Procesamiento del Lenguaje Natural, la gran mayoría del trabajo sobre argumentación computacional en el dominio medico se ha centrado únicamente en ingles. En este articulo investigamos varias estrategias para realizar AM en textos médicos para un idioma como el español, para el cual no existen datos manualmente etiquetados. Nuestro trabajo muestra que traducir y proyectar automáticamente anotaciones del ingles a un idioma de destino determinado como el español es una forma eficaz de generar datos anotados sin necesidad de realizar anotación manual. Por otra parte, se demuestra experimentalmente que traducir y proyectar obtiene mejores resultados que los métodos basados en las capacidades de transferencia crosslingüe de modelos de lenguaje multilingües. Finalmente, usamos los datos automáticamente generados para español para mejorar los resultados originales en inglés, proporcionando así una estrategia de aumento de datos totalmente automática.

    • English

      Nowadays the medical domain is receiving much more attention in applications involving Artificial Intelligence as clinicians decision-making is increasingly dependent on dealing with enormous amounts of unstructured textual data. In this context, Argument Mining (AM) helps to meaningfully structure textual data by identifying the argumentative components in the text and classifying the relations between them. However, as it is the case for many tasks in Natural Language Processing in general and in medical text processing in particular, the large majority of the work on computational argumentation has been focusing only on the English language. In this paper, we investigate several strategies to perform AM in medical texts for a language such as Spanish, for which no annotated data is available. Our work shows that automatically translating and projecting annotations (data-transfer) from English to a given target language is an effective way to generate annotated data without costly manual intervention. Furthermore, and contrary to conclusions from previous work for other sequence labelling tasks, our experiments demonstrate that data-transfer outperforms methods based on the the crosslingual transfer capabilities of multilingual pre-trained language models (model-transfer). Finally, we show how the automatically generated data in Spanish can also be used to improve results in the original English monolingual setting, providing thus a fully automatic data augmentation strategy. Data, code, and fine-tuned models are publicly available at https://huggingface.co/datasets/HiTZ/AbstRCT-ES.

  • Referencias bibliográficas
    • Accuosto, P., M. Neves, and H. Saggion. 2021. Argumentation mining in scientific literature: From computational linguistics to biomedicine....
    • Agerri, R. and E. Agirre. 2023. Lessons learned from the evaluation of Spanish Language Models. Proces. del Leng. Natural, 70:157–170.
    • Agerri, R., Y. Chung, I. Aldabe, N. Aranberri, G. Labaka, and G. Rigau. 2018. Building named entity recognition taggers via parallel corpora....
    • Alamri, A. and M. Stevenson. 2016. A corpus of potentially contradictory research claims from cardiovascular research abstracts. Journal of...
    • Artetxe, M., S. Ruder, and D. Yogatama. 2020. On the cross-lingual transferability of monolingual representations. In Proceedings of the 58th...
    • Beltagy, I., K. Lo, and A. Cohan. 2019. Scibert: A pretrained language model for scientific text. arXiv preprint arXiv:1903.10676.
    • Chen, X., A. H. Awadallah, H. Hassan, W. Wang, and C. Cardie. 2019. Multi-source cross-lingual model transfer: Learning what to share. In...
    • Conneau, A., K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov. 2020. Unsupervised...
    • Devlin, J., M.-W. Chang, K. Lee, and K. Toutanova. 2019. Bert: Pretraining of deep bidirectional transformers for language understanding....
    • Dou, Z.-Y. and G. Neubig. 2021. Word alignment by fine-tuning embeddings on parallel corpora. arXiv preprint arXiv:2101.08231.
    • Eger, S., J. Daxenberger, C. Stab, and I. Gurevych. 2018. Cross-lingual argumentation mining: Machine translation (and a bit of projection)...
    • Fan, A., S. Bhosale, H. Schwenk, Z. Ma, A. El-Kishky, S. Goyal, M. Baines, O. Celebi, G. Wenzek, V. Chaudhary, et al. 2021. Beyond english-centric...
    • Gaddy, D. M., Y. Zhang, R. Barzilay, and T. S. Jaakkola. 2016. Ten pairs to tagmultilingual pos tagging via coarse mapping between embeddings....
    • García-Ferrero, I., R. Agerri, and G. Rigau. 2022. Model and data transfer for crosslingual sequence labelling in zero-resource settings....
    • Green, N., E. Cabrio, S. Villata, and A. Wyner. 2014. Argumentation for scientific claims in a biomedical research article. In ArgNLP, pages...
    • Lee, J., W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang. 2019. Biobert: A pre-trained biomedical language representation model for...
    • Lewis, P., B. Oguz, R. Rinott, S. Riedel, and H. Schwenk. 2020. MLQA: Evaluating cross-lingual extractive question answering. In Proceedings...
    • Li, M., S. Geng, Y. Gao, S. Peng, H. Liu, and H. Wang. 2017. Crowdsourcing argumentation structures in chinese hotel reviews. In 2017 IEEE...
    • Liu, Z., G. I. Winata, S. Cahyawijaya, A. Madotto, Z. Lin, and P. Fung. 2020. On the importance of word order information in cross-lingual...
    • Mayer, T., S. Marro, E. Cabrio, and S. Villata. 2021. Enhancing evidence-based medicine with natural language argumentative analysis of clinical...
    • Mochales, R. and A. Ieven. 2009. Creating an argumentation corpus: do theories apply to real arguments? a case study on the legal argumentation...
    • Peldszus, A. and M. Stede. 2013. From argument diagrams to argumentation mining in texts: A survey. International Journal of Cognitive Informatics...
    • Pires, T. J. P., E. Schlinger, and D. Garrette. 2019. How Multilingual is Multilingual BERT? In ACL.
    • Sabet, M. J., P. Dufter, F. Yvon, and H. Schütze. 2020. Simalign: High quality word alignments without parallel training data using static...
    • Sackett, D. L., W. M. C. Rosenberg, J. A. M. Gray, R. B. Haynes, and W. S. Richardson. 1996. Evidence based medicine: what it is and what...
    • Shankar, R. D., S. W. Tu, and M. A. Musen. 2006. Medical arguments in an automated health care system. In AAAI Spring Symposium: Argumentation...
    • Sousa, A., B. Leite, G. Rocha, and H. L. Cardoso. 2021. Cross-lingual annotation projection for argument mining in portuguese. In Portuguese...
    • Stab, C. and I. Gurevych. 2014. Annotating argument components and relations in persuasive essays. In Proceedings of COLING 2014, the 25th...
    • Stab, C. and I. Gurevych. 2017. Parsing argumentation structures in persuasive essays. Computational Linguistics, 43(3):619–659, September.
    • Tang, Y., C. Tran, X. Li, P.-J. Chen, N. Goyal, V. Chaudhary, J. Gu, and A. Fan. 2020. Multilingual translation with extensible multilingual...
    • Tiedemann, J., S. Thottingal, et al. 2020. Opus-mt–building open translation services for the world. In Proceedings of the 22nd Annual Conference...
    • Toulmin, S. E. 1958. The uses of argument. Cambridge university press.
    • Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. 2017. Attention is all you need. In...
    • Wu, S. and M. Dredze. 2020. Are All Languages Created Equal in Multilingual BERT? In Workshop on Representation Learning for NLP.
    • Yang, Z., R. Salakhutdinov, and W. W. Cohen. 2017. Transfer learning for sequence tagging with hierarchical recurrent networks. arXiv preprint...
    • Yarowsky, D., N. Grace, W. Richard, et al. 2001. Inducing multilingual text analysis tools via robust projection across aligned corpora. In...

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno