Ir al contenido

Documat


Ensembles for clinical entity extraction

  • Autores: Rebecka Weegar, Alicia Pérez Ramírez Árbol académico, Hercules Dalianis, Koldobika Gojenola Galletebeitia Árbol académico, Arantza Casillas Rubio Árbol académico, Maite Oronoz Anchordoqui Árbol académico
  • Localización: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 60, 2018, págs. 13-20
  • Idioma: inglés
  • Títulos paralelos:
    • Agrupaciones para la extracción de entidades clínicas
  • Enlaces
  • Resumen
    • español

      Los informes médicos son una valiosa fuente de conocimiento clínico.

      Las técnicas de Procesamiento del Lenguaje Natural han sido aplicadas al procesamiento de informes médicos para diversas aplicaciones. Generalmente un primer paso es la detección de entidades médicas: identifcar medicamentos, enfermedades y partes del cuerpo. Sin embargo, la mayoría de los trabajos se han desarrollado para informes en Inglés. El objetivo de este trabajo es mejorar el reconocimiento de entidades médicas para otras lenguas diferentes a Inglés, comparando los mismos métodos en dos lenguas y utilizando agrupaciones de modelos. Los modelos han sido creados para informes médicos en Español y Sueco utilizando SVM, Perceptron, CRF y cuatro conjuntos diferentes de atributos, incluyendo atributos no supervisados. Para el modelo combinado se ha aplicado votación ponderada teniendo en cuenta la F-measure individual. En conclusión, el modelo combinado mejora el rendimiento general y para posibles mejoras debemos investigar métodos más sofisticados de agrupación.

    • English

      Health records are a valuable source of clinical knowledge and Natural Language Processing techniques have previously been applied to the text in health records for a number of applications. Often, a first step in clinical text processing is clinical entity recognition; identifying, for example, drugs, disorders, and body parts in clinical text. However, most of this work has focused on records in English.

      Therefore, this work aims to improve clinical entity recognition for languages other than English by comparing the same methods on two diferent languages, specifically by employing ensemble methods. Models were created for Spanish and Swedish health records using SVM, Perceptron, and CRF and four diferent feature sets, including unsupervised features. Finally, the models were combined in ensembles.

      Weighted voting was applied according to the models individual F-scores. In conclusion, the ensembles improved the overall performance for Spanish and the precision for Swedish.

  • Referencias bibliográficas
    • Agerri, R. and G. Rigau. 2016. Robust multilingual named entity recognition with shallow semi-supervised features. Artificial Intelligence,...
    • Brown, P. F., P. V. Desouza, R. L. Mercer, V. J. D. Pietra, and J. C. Lai. 1992. Classbased n-gram models of natural language. Computational...
    • Carreras, X., L. Márquez, and L. Padró. 2003. Learning a perceptron-based named entity chunker via online recognition feedback. In Proceedings...
    • Crammer, K., M. Dredze, K. Ganchev, P. P. Talukdar, and S. Carroll. 2007. Automatic code assignment to medical text. In Proceedings of the...
    • Dalianis, H., A. Henriksson, M. Kvist, S. Velupillai, and R. Weegar. 2015. HEALTH BANK–A Workbench for Data Science Applications in Healthcare....
    • Demner-Fushman, D., W. W. Chapman, and C. J. McDonald. 2009. What can natural language processing do for clinical decision support? Journal...
    • Ehrentraut, C., H. Tanushi, H. Dalianis, and J. Tiedemann. 2012. Detection of Hospital Acquired Infections in sparse and noisy Swedish patient...
    • Faruqui, M. and S. Padó. 2010. Training and Evaluating a German Named Entity Recognizer with Semantic Generalization. In KONVENS, pages...
    • Florian, R., A. Ittycheriah, H. Jing, and T. Zhang. 2003. Named entity recognition through classifier combination. In Proceedings of the seventh...
    • Haas, J. P., E. A. Mendonça, B. Ross, C. Friedman, and E. Larson. 2005. Use of computerized surveillance to detect nosocomial pneumonia...
    • Henriksson, A. 2015. Ensembles of semantic spaces: On combining models of distributional semantics with applications in healthcare. Ph.D....
    • Henriksson, A., M. Kvist, H. Dalianis, and M. Duneld. 2015. Identifying adverse drug event information in clinical notes with distributional...
    • Kang, N., Z. Afzal, B. Singh, E. M. Van Mulligen, and J. A. Kors. 2012. Using an ensemble system to improve concept extraction from clinical...
    • Lafferty, J. D., A. McCallum, and F. C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence...
    • Lai, K. H., M. Topaz, F. R. Goss, and L. Zhou. 2015. Automated misspelling detection and correction in clinical freetext records. Journal...
    • Liang, P. 2005. Semi-Supervised Learning for Natural Language. Ph.D. thesis, Massachusetts Institute of Technology.
    • McNemar, Q. 1947. Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2):153–157.
    • Murphy, K. P. 2012. Machine Learning: A Probabilistic Perspective. MIT Press.
    • Oronoz, M., A. Casillas, K. Gojenola, and A. Perez. 2013. Automatic annotation of medical records in Spanish with disease, drug and substance...
    • Östling, R. 2013. Stagger: An open-source part of speech tagger for Swedish. Northern European Journal of Language Technology, 3:1–18.
    • Pérez, A., K. Gojenola, A. Casillas, M. Oronoz, and A. D. a. de Ilarraza. 2015. Computer aided classification of diagnostic terms in Spanish....
    • Pérez, A., R. Weegar, A. Casillas, K. Gojenola, M. Oronoz, and H. Dalianis. 2017. Semi-supervised medical entity recognition: A study on Spanish...
    • Reynoso, G. A., A. D. March, C. M. Berra, R. P. Strobietto, M. Barani, M. Iubatti, M. P. Chiaradio, D. Serebrisky, A. Kahn, O. A. Vaccarezza,...
    • Ruch, P., R. Baud, and A. Geissbühler. 2003. Using lexical disambiguation and named-entity recognition to improve spelling correction in...
    • Saha, S. and A. Ekbal. 2013. Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition....
    • Skeppstedt, M., M. Kvist, G. Nilsson, and H. Dalianis. 2014. Automatic recognition of disorders, findings, pharmaceuticals and body structures...
    • Tang, B., H. Cao, X. Wang, Q. Chen, and H. Xu. 2014. Evaluating word representation features in biomedical named entity recognition tasks....
    • Turian, J., L. Ratinov, and Y. Bengio. 2010. Word representations: a simple and general method for semi-supervised learning. In Proceedings...
    • Zea, J. L. C., J. E. O. Luna, C. Thorne, and G. Glavaš. 2016. Spanish NER with Word Representations and Conditional Random Fields. In Proceedings...

Fundación Dialnet

Mi Documat

Opciones de artículo

Opciones de compartir

Opciones de entorno