Ir al contenido

Documat


Computational Reproducibility of Named Entity Recognition methods in the biomedical domain

  • Autores: Ana M. García Serrano Árbol académico, Sebastian Hennig, Andreas Nürnberger
  • Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 66, 2021, págs. 141-152
  • Idioma: inglés
  • Títulos paralelos:
    • Reproducción computacional de métodos de reconocimiento de entidades nombradas en un dominio biomédico
  • Enlaces
  • Resumen
    • español

      Los enfoques para reconocimiento de entidades nombradas no supervisados (NER, por sus siglas en inglés) no dependen de corpus con datos etiquetados, sino de una fuente de conocimiento donde buscar candidatos prometedores para encontrar el concepto correspondiente. En el ámbito biomédico existe la fuente denominada “Sistema Unificado de Lenguaje Médico” (UMLS, por sus siglas en inglés). En este artículo, se evalúan y comparan tres modelos diferentes de NER no supervisados que utilizan UMLS, a saber, MetaMap, cTakes y MetaMapLite, a partir de los resultados publicados por Demner-Fushman, Rogers y Aronson (2017) y Reategui y Ratte (2018). Para ello se desarrolla el entorno Unsupervised Biomedical Named Entity Recognition (UB-NER), con el que se presentan resultados de los experimentos en los modelos, cinco datasets y dos tareas NER.

    • English

      Unsupervised Named Entity Recognition (NER) approaches do not depend on labelled data to function properly but rather on a source of knowledge, in which promising candidates can be looked up to find the corresponding concept. In the biomedical domain knowledge source like this already exists; namely the Unified Medical Language System (UMLS). In this paper, three different unsupervised NER models using UMLS, namely MetaMap, cTakes and MetaMapLite are evaluated and compared from the results published by Demner-Fushman, Rogers and Aronson (2017) and Reategui and Ratte (2018). The Unsupervised Biomedical Named Entity Recognition framework (UB-NER) is developed, with which the results of the experiments of the three models, five datasets and two NER tasks are presented.

  • Referencias bibliográficas
    • Aronson, A.R. 2001. Effective mapping of biomedical text to the UMLS Metathesaurus: theMetaMap program. Proc. AMIA Annual Symposium, pages...
    • Benavent, J., X. Benavent, E. de Ves, R. Granados, and A. Garcia-Serrano. 2010. Experiences at ImageCLEF 2010 using CBIR and TBIR Mixing Information...
    • Bhasuran, B., G. Murugesan, S. Abdulkadhar, and J. Natarajan. 2016. Stacked ensemble combined with fuzzy matching for biomedical named entity...
    • Campos, D., S. Matos, and J. L. Oliveira. 2015. Gimli: Open source and high-performance biomedical name recognition. BMC Bioinformatics 14.1...
    • Cho, M., J. Ha, C. Park, and S. Park. 2020. Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition....
    • Demner-Fushman, D., W. J. Rogers, and A. R. Aronson. 2017. MetaMap Lite: an evaluation of a new Java implementation of MetaMap. J. of the...
    • Devlin, J., M.Chang, K. Lee, and K. Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Technical...
    • Dogan, R.I., R. Leaman, and Z. Lu. 2014. NCBI disease corpus: A resource for disease name recognition and concept normalization. Journal of...
    • Hennig, S. 2020. An experimental survey of Named Entity Recognition methods in the biomedical domain. Master Data and Knowledge Engineering....
    • Hennig, S. and A. Garcia-Serrano. 2020. Reproducible experiments on the master thesis: An experimental survey of Named Entity Recognition...
    • Lample, G., M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer. 2016. Neural architectures for named entity recognition. Proc. of NAACL...
    • Lara-Clares, A., A. Garcia-Serrano. 2019. LSI2_UNED at eHealth-KD Challenge 2019: A Few-shot Learning Model for Knowledge Discovery from eHealth Documents....
    • Lastra-Díaz, J.J. and A. Garcia-Serrano. 2015a. A novel family of IC-based similarity measures with a detailed experimental survey on WordNet....
    • Lastra-Díaz, J.J. and A. Garcia-Serrano. 2015b. A new family of information content models with an experimental survey on WordNet. Knowledge-Based...
    • Lee, J., W. Yoon, S. Kim, D. Kim, S. Kim, C.Ho So, and J. Kang. 2020. BioBERT: A pre-trained biomedical language representation model for...
    • Merkel, D. 2014. Docker: lightweight Linux containers for consistent development and deployment. https://dl.acm.org/doi/10.5555/2600239.2600241.
    • Mowery, D. 2013. ShAReCLEF eHealth Evaluation Lab 2014 (Task 2): Disorder Attributes in Clinical Reports. PhysioNet https://doi.org/10.13026/0zgk-9j94.
    • Reategui, R. and S. Ratte. 2018. Comparison of MetaMap and cTAKES for entity extraction in clinical notes. BMC Medical Informatics and Decision...
    • Sagae, K. and J. Tsujii. 2007. Dependency Parsing and Domain Adaptation with LR Computational Reproducibility of Named Entity Recognition...
    • Savova, G., J. Masanz, P. V. Ogren, J. Zheng, S. Sohn, K. C. Kipper-Schuler, and C. G. Chute. 2010. Mayo clinical Text Analysis and Knowledge...
    • Segura-Bedmar, I. and P. Martínez. 2017. Simplifying drug package leaflets written in Spanish by using word embedding. Journal of Biomedical...
    • Uzuner, A. 2009. Recognizing Obesity and Comorbidities in Sparse Data. Journal of the American Medical Informatics Association, 16(4):561–570,...
    • Gang, Y., Y. Yang, X. Wang, H. Zhen, G. He, Z. Li, Y. Zhao, Q. Shu, and L. Shu. 2020. Adversarial active learning for the identification of...

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno