Alberto Gil de la Fuente
Metabolomics is a subarea of the systems biology devoted to the study of the small size molecules (usually < 1,000 Da) produced by the metabolic processes happening in a cell. Since the end of the previous century untargeted metabolomics has been successfully applied to different domains such as biomarker discovery, therapeutical targets discovery, personalized medicine or providing knowledge about organisms and mechanisms of health and disease. Untargeted metabolomics, by nature, aims to obtain as much information as possible to maximize the number of detected and identified metabolites, being the metabolite identification vital in the final success of the studies.
The number of extracted metabolites and subsequently identified with certain confidence level can be defined as “metabolite coverage”. The identification is the main bottleneck in metabolomic studies since the analytical information acquired requires a high amount of work and knowledge to be successfully exploited. On the one hand, separation and detection provide a valuable information that can be exploited in an automatic way by software tools. On the other hand, currently there are a large number of metabolomic data sources containing information about the metabolites they store. Both information coming from the analyses and the data sources can be used to provide a higher confidence level in the metabolite identification.
The final goal of this thesis is the design, validation and implementation of a software tool that allows the simultaneous query over different metabolomic databases to offer the researchers the possibility of retrieving data from them in a single step. This simultaneous query will allow the access to more data both in depth, since they will be able to access the complementary information stored in distinct databases about metabolites contained in more than one database, and width, since there are a high number of compounds only present in a single database, with the consequent risk for the researchers of skipping metabolites during the annotation and identification process, thus potentially increasing the number of unknows in the experiment.
Furthermore, the tool should exploit the analytical and non-analytical information to aid during the metabolite annotation and identification, therefore increasing the metabolite coverage in the metabolomic studies and reducing the number of misidentifications that lead to potential wrong biological interpretations.
The first chapter reviews the available resources and data sources for the metabolite identification using Electrospray as ionization technique. The information contained in those resources is often complementary and the metabolite overlap is low. Therefore, the researchers should query different resources to boost the metabolite coverage in their studies. The second chapter introduces the first version of the software tool performed in this thesis: CEU Mass Mediator (CMM). The tool develops a heuristic approach for metabolite annotation from information coming from MS1 and the RT or MT obtained in the chromatographic or electrophoretic separation. The third chapter presents the acquisition of analytical knowledge from oxidized glycerophosphocholines and the creation of a semi-automated approach for their detection and identification using the RT and information obtained in MS1 and MS2 analysis. The fourth chapter describes the updates performed in CMM. New services have been gradually incorporated such as a spectral quality assessment, the incorporation of ontology and taxonomy information, and the support of MS2 searches. All the services present in CMM are available through a REST API to facilitate the automatic access and the communication with other software tools.
The metabolites are the end products and the responsible of the biological systems status. The correctness and completeness of metabolite identification result in a higher amount of information for the subsequent biological interpretation. Consequently, we remark the necessity of combining analytical and non-analytical information to obtain and provide a higher confidence level in the metabolite identification, as well as the utility of the software tools in helping researchers to successfully conduct their experiments.
© 2008-2024 Fundación Dialnet · Todos los derechos reservados