Advancing Deep Learning Solutions in the Era of Precision Medicine: From Omics Data to Clinical Narratives.

Guillermo López García

Ayuda

Advancing Deep Learning Solutions in the Era of Precision Medicine: From Omics Data to Clinical Narratives.

Autores: Guillermo López García
Directores de la Tesis: Francisco Javier Veredas Navarro (dir. tes.) , José Manuel Jerez Aragonés (codir. tes.)
Lectura: En la Universidad de Málaga ( España ) en 2024
Idioma: inglés
Tribunal Calificador de la Tesis: Martin Krallinger (presid.) , Ezequiel López Rubio (secret.) , David Elizondo (voc.)
Enlaces
- Tesis en acceso abierto en: RIUMA
Resumen
- The main objective of this PhD Thesis is the development of deep learning (DL)-based approaches to tackle inherently complex predictive problems in the domain of precision medicine. We have focused on two of the most important data modalities in personalized medicine nowadays: omics data and clinical narratives. For both, we have proposed and developed several strategies that have proven successful in overcoming the challenges encountered by DL methods when applied to clinical data.
  
  Transfer learning (TL) has emerged as a pivotal methodology in this Thesis. TL-based approaches have been developed to counterbalance the scarcity of annotated samples, crucial for training sophisticated DL algorithms on omics data and clinical notes. For instance, TL has been leveraged in proteomics to predict methionine oxidation sites utilizing extensive phosphorylation data. Moreover, in transcriptomics, TL methodologies have facilitated effective survival prediction on a specific cancer type using a large dataset of gene expression samples from thirty other tumor types. TL strategies have also played a significant role in adapting Transformer-based models to the particularities of clinical narratives. The resulting models have being assessed on multiple tasks in the clinical Natural Language Processing (NLP) domain. Across all predictive problems examined in this Thesis, the developed domain-specific transformers have established new state-of-the-art (SOTA) performances.
  
  Moreover, in this Thesis other methodologies beyond TL have been developed to address challenges associated with applying DL models to medical data. In this way, novel approaches have been designed for generating structured representation from omics data. Thus, the proposed methodology transforms transcriptomics samples into gene expression images, effectively employed by convolutional neural network (CNN) models to predict cancer survival. Furthermore, we have developed strategies to enhance the explainability of DL models in processing clinical narratives. We have addressed the problem of explainable clinical coding using transformers, by requiring the models to accurately justify their predictions. A hierarchical-task approach was developed to tackle the problem as a dual medical entity recognition and normalization task, by leveraging contextual information contained in medical documents. Additionally, the proposed methodology has the potential to be applied to other clinical tasks that require both the detection and normalization of clinical entities. Finally, work has been performed to assess the robustness of advanced DL methods in true clinical settings, validating the viability of deploying these SOTA systems in real-world medical scenarios.
  
  The research conducted in this Thesis is supported by eight high-impact publications that form this Memorandum. The presented Thesis has had an impact in both academic and industrial realms. It has pioneered the development of transformers adapted to the domain of Spanish real-world medical narratives, and proposed innovative approaches in applying CNN models to gene expression data, amassing numerous academic citations and industrial interests. In addition, it has achieved recognition through awards and successful real-world applications, emphasizing the substantial potential of the developed systems in digital health advancements.