Ir al contenido

Documat


Large Language Models for Biomedical Entity Recognition and Normalization in Low-Resource Languages and Settings

  • Autores: Fernando Gallego Donoso
  • Directores de la Tesis: Francisco Javier Veredas Navarro (dir. tes.) Árbol académico, José Manuel Jerez Aragonés (tut. tes.) Árbol académico
  • Lectura: En la Universidad de Málaga ( España ) en 2025
  • Idioma: inglés
  • Títulos paralelos:
    • Modelos masivos de lenguaje para el reconocimiento y la normalización de entidades biomédicas en idiomas y entornos con pocos recursos
  • Tribunal Calificador de la Tesis: Gloria Corpas Pastor (presid.) Árbol académico, Daniel Urda Muñoz (secret.) Árbol académico, David Elizondo (voc.) Árbol académico
  • Enlaces
  • Resumen
    • This PhD thesis focuses on the development of advanced natural language processing (NLP) solutions in the clinical domain, addressing the challenges posed by the high linguistic and structural variability of electronic health records. The rise of artificial intelligence (AI) and greater access to computational resources have enabled the analysis of large volumes of clinical texts, allowing for more precise and efficient extraction, normalization, and linking of biomedical entities.

      Terminological complexity, the presence of synonyms, abbreviations, and typographical errors, as well as the heterogeneity of information sources, require robust techniques such as transfer learning and continuous model adaptation. These methodologies enhance model generalization in contexts characterized by high uncertainty, data scarcity-such as rare diseases-and low-resource languages, including Spanish and other co-official languages. Furthermore, the integration of structured and unstructured sources demands adaptive and versatile solutions.

      This research proposes an innovative approach based on large language models (LLMs) and generative techniques, improving the extraction, normalization, and semantic linking of biomedical entities in clinical records. The developed strategies have surpassed previous state-of-the-art performance in named entity recognition (NER) and normalization (MEL), achieving top-25 accuracy above 75% on the main biomedical corpora. The results, supported by comparative studies and the publication of six scientific articles, demonstrate the impact of these technologies on optimizing clinical data analysis and lay the groundwork for future applications that will contribute to the improvement of healthcare and the advancement of biomedical NLP.


Fundación Dialnet

Mi Documat

Opciones de tesis

Opciones de compartir

Opciones de entorno