Esta tesis doctoral ha desarrollado y aplicado con éxito múltiples técnicas y algoritmos bioinformáticos para abordar y resolver desafíos específicos en la gestión, análisis e interpretación de datos ómicos complejos derivados de muestras humanas de cáncer. El trabajo se ha centrado en dos tipos principales de cáncer: el cáncer de mama (BRCA) y el cáncer colorrectal (CRC). Los datos ómicos se han integrado y analizado junto con información clínica relevante, principalmente datos de supervivencia y progresión de la enfermedad, así como otros parámetros clave como el estadio del tumor, el grado y la respuesta al tratamiento. El trabajo también aborda el campo de la farmacogenómica, explorando las correlaciones entre los fármacos antitumorales (como compuestos químicos pequeños) y la activación de genes humanos. Como comentario general final, este trabajo ha demostrado el potencial de integrar enfoques estadísticos avanzados y bioinformáticos con grandes conjuntos de datos de cáncer. Con ello hemos conseguido mejorar tanto el análisis de supervivencia como la identificación de dianas terapéuticas, con el objetivo final de optimizar la aplicación combinada de datos ómicos y clínicos para una mejor caracterización de los resultados de los pacientes con cáncer. Los métodos desarrollados no solo proporcionan herramientas valiosas para la investigación del cáncer, sino que también sientan una base sólida para futuras aplicaciones en el campo de la medicina personalizada.
This Doctoral Thesis, entitled: "Exploration and development of bioinformatics methods for survival analysis and drug targeting in cancer", is centred in the field of Bioinformatics and Functional Genomics applied to cancer research and focuses on two specific topics or research lines. The first line corresponds to the development of survival analysis methods and patient risk assignment using omic data combined with clinical data and the application of this methodology to different types of cancer to discover specific genes (mainly protein-coding genes) that are good survival biomarkers for specific cohorts of patients. The second line corresponds to developing a bioinformatics strategy and methodology to identify cancer-related genes that are putative targets of specific drugs (currently FDA-approved anticancer drugs or other drugs). This methodology to locate gene-drug targets is based on a correlation analysis of gene expression and drug activity profiles and the generation of drug-gene bipartite networks derived from these similarity analyses. This methodology also allows the definition of a novel drug similarity index, validated by structural comparison of the drugs that share a significant number of targets.
Both lines of research were conducted using mainly open-access cancer data in online repositories and generating different harmonised cohorts of tumour samples for different cancer types (such as breast cancer and colorectal cancer). This was done to ensure the robustness and stability of the methods under various biological conditions. At the same time, we realised that most of the available tools and platforms for cancer survival analysis based on genome-wide expression data plus clinical data lack transparency in selecting the best biomarkers associated with the prognosis and outcome of the tumour samples and patients. As a result, decisions derived from many of the published survival studies cannot be well interpreted in terms of clinical knowledge and do not show correlation or clear associations with defined biomolecular markers. Moreover, it is quite common that studies testing patient survival based on biomolecular features do not include huge tumour sample cohorts (most of the time with less than 300 patients), and this is a critical problem for performing robust statistical analyses, especially when we want to evaluate the survival of cancer patients over more than 5 or 10 years.
Lastly, another important observation within our research framework is that the analysis of large omic datasets and, moreover, the generation of highly informative outputs (such as bipartite networks that may contain information about thousands of biomolecular entities, i.e., genes and proteins, or thousands of chemical entities, i.e., drugs) require efficient and computationally intensive bioinformatics software programming. This is one of the main challenges we have addressed in this research.
Main hypothesis and specific objectives Considering the challenges described above in the context of current research in Bioinformatics and Computational Biology applied to cancer, we propose a main hypothesis to be tested and developed in our scientific work for this Doctoral Thesis.
In short, the hypothesis is that a robust bioinformatic analysis can be achieved by integrating multiple datasets of full transcriptomic data (i.e., gene expression data) from tumour samples of cancer patients and other human samples, along with clinical information about disease progression and patient survival. These analyses will involve the development of new algorithms that integrate all of these complex omic and clinical data using the advanced programming software R and various R packages which facilitate the application of robust statistical, computational, and machine learning methods. In addition, this integrative strategy will be applied to study the correlations between human gene expression and drug activity in cancer cell lines, using large-scale genomic and transcriptomic data associated with these samples to infer novel cancer drug targets. These associations can map the gene products that may receive a given drug's effect, influence, or actions and propose these as novel druggable gene modules.
Under the described hypothesis, we have designed the scientific work of this Doctoral Thesis to develop and achieve a series of specific objectives, organised in three different chapters within this Doctoral Thesis.
Objective 1.- Development and optimisation of a survival analysis and risk prediction algorithm for cancer patients based on genome-wide expression and clinical disease progression data.
Objective 2.- Application of the developed survival analysis algorithm to various large human cancer datasets to find novel biomarkers associated with patient prognosis or to build gene signatures associated with survival and risk.
Objective 3.- Development of a robust bioinformatics methodology to find and propose new plausible drug gene-target modules for anticancer drugs based on gene expression profiles and drug activity profiles, and construction of the derived drug-to-gene bipartite networks.
© 2008-2025 Fundación Dialnet · Todos los derechos reservados