José Luis Mosquera Mayo
The advent of high-throughput technologies has generated a huge quantity of omics data. The results of these experiments usually are long lists of genes that can be used as biomarkers. A major challenge for the researchers is to attribute a biological interpretation or significance to these list of potential biomarkers, by using biological information stored in bioinformatics resources such as the Gene Ontology (GO) or the Kyoto Encyclopedia of Genes and Genomes (KEGG), or combining them with other types of omics data. This dissertation had two main objectives. First, to study mathematical properties of two types of semantic similarity measures for exploring GO categories, and second, to classify and to study the evolution of GO tools for enrichment analysis. The first measure considered was a semantic similarity measure proposed by Lord et al. It is a node- based approach based on the Graph Theory. The second measure actually was a group pseudo- distances proposed Joslyn et al. They were edge-based approaches based on the algebraic point of view of the Partially Ordered Sets (POSET) Theory. So, in order of reaching our objectives, first of all a review and description of main methods about graph theory and POSET theory was carried out. This fact allowed us to realized that there are to ways for mapping objects (e.g. genes) in to the terms of an ontology (e.g. GO). First formulation is called Object-Ontology Complex (OOC). It was proposed by Carey in order to perform statistical computations. Second formulation is called POSET Ontology (POSO) and it was introduced by Joslyn et al. In order to classify the GO for enrichment analysis the first 26 GO available at the website of The GO Consortium were surveyed. These left us list of 205 features that were used for building an Standard Functionalities Set. Based on these functionalities the 26 GO tools were classified according to their capabilities. The study of the GO tools evolution was based on the monitoring of these 26 GO tools. So the statistical analysis consisted of a descriptive statistics, an inferential analysis and a multivariate analysis. With regard to the first objective, we have seen the Lord's measure is the same as the Resnik's measure, previously published. It has observed that there exists a certain level of analogy between the formalization of the OOC and the POSO for mapping genes to objects to terms of an ontology. A property and a corollary to calculate semantic similarity measures from node-based approaches based on a matrix point of view have been proposed. It has been proved that the Lord's measure and the Joslyn's measure can be redefined in terms of metric distance. An R package called sims for computing semantic similarity measures between terms of an arbitrary ontology and comparing semantic similarity profiles based on the GO terms associated with two lists of genes has been developed. Based on the classification of the GO programs a web-based tool called SerbGO devoted to select and compare GO tools stored in was developed. The statistical analysis about the evolution of GO tools suggested that the promoters have introduced improvements over time, but clear models of GO tools have been detected. According to the results of the statistical analysis an ontology called DeGOT was built in order to provide an structured vocabulary for the developers when they dealing with the task of introducing improvements in the existing GO tools for enrichment analysis or designing a new one program. DeGOT can be used for supporting queries and comparison results of SerbGO.
© 2008-2024 Fundación Dialnet · Todos los derechos reservados