Large Language Models for Biomedical Entity Recognition and Normalization in Low-Resource Languages and Settings

Fernando Gallego Donoso

Ayuda

Large Language Models for Biomedical Entity Recognition and Normalization in Low-Resource Languages and Settings

Autores: Fernando Gallego Donoso
Directores de la Tesis: Francisco Javier Veredas Navarro (dir. tes.) , José Manuel Jerez Aragonés (tut. tes.)
Lectura: En la Universidad de Málaga ( España ) en 2025
Idioma: inglés
Títulos paralelos:
- Modelos masivos de lenguaje para el reconocimiento y la normalización de entidades biomédicas en idiomas y entornos con pocos recursos
Tribunal Calificador de la Tesis: Gloria Corpas Pastor (presid.) , Daniel Urda Muñoz (secret.) , David Elizondo (voc.)
Enlaces
- Tesis en acceso abierto en: TESEO RIUMA
Resumen
- This PhD thesis focuses on the development of advanced natural language processing (NLP) solutions in the clinical domain, addressing the challenges posed by the high linguistic and structural variability of electronic health records. The rise of artificial intelligence (AI) and greater access to computational resources have enabled the analysis of large volumes of clinical texts, allowing for more precise and efficient extraction, normalization, and linking of biomedical entities.
  
  Terminological complexity, the presence of synonyms, abbreviations, and typographical errors, as well as the heterogeneity of information sources, require robust techniques such as transfer learning and continuous model adaptation. These methodologies enhance model generalization in contexts characterized by high uncertainty, data scarcity-such as rare diseases-and low-resource languages, including Spanish and other co-official languages. Furthermore, the integration of structured and unstructured sources demands adaptive and versatile solutions.
  
  This research proposes an innovative approach based on large language models (LLMs) and generative techniques, improving the extraction, normalization, and semantic linking of biomedical entities in clinical records. The developed strategies have surpassed previous state-of-the-art performance in named entity recognition (NER) and normalization (MEL), achieving top-25 accuracy above 75% on the main biomedical corpora. The results, supported by comparative studies and the publication of six scientific articles, demonstrate the impact of these technologies on optimizing clinical data analysis and lay the groundwork for future applications that will contribute to the improvement of healthcare and the advancement of biomedical NLP.