Ir al contenido

Documat


Algebraic and semi-algebraic phylogenetic reconstruction

  • Autores: Marina Garrote López
  • Directores de la Tesis: Marta Casanellas Rius (dir. tes.) Árbol académico, Jesús Fernández Sánchez (codir. tes.) Árbol académico
  • Lectura: En la Universitat Politècnica de Catalunya (UPC) ( España ) en 2021
  • Idioma: español
  • Texto completo no disponible (Saber más ...)
  • Resumen
    • Phylogenetics is the study of the evolutionary history and relationships among groups of biological entities (called taxa). The modeling of those evolutionary processes is done by phylogenetic trees whose nodes represent different taxa and whose branches correspond to the evolutionary processes between them. The leaves usually represent contemporary taxa and the root is their common ancestor. Nowadays, phylogenetic reconstruction aims to estimate the phylogenetic tree that best explains the evolutionary relationships of current taxa using solely information from their genome arranged in an alignment. We focus on the reconstruction of the topology of phylogenetic trees, which means reconstructing the shape of the tree considering labels at the leaves.To this end, one usually assumes that DNA sequences evolve according to a Markov process ruled by a prescribed model of nucleotide substitutions. These substitution models specify some transition matrices at the edges of the tree and a distribution of nucleotides at the root. Given a tree T and a substitution model, one can compute the distribution of nucleotide patterns at the leaves of T in terms of the model parameters. This joint distribution is represented by a vector whose entries can be expressed as polynomials on the model parameters and satisfy certain algebraic relationships. The study of these relationships and the geometry of the algebraic varieties defined by them (called phylogenetic varieties) have provided successful insight into the problem of phylogenetic reconstruction. However, from a biological perspective we are not interested in the whole variety, but only in the region of points that arise from stochastic parameters (the so-called phylogenetic stochastic region). The description of these regions leads to semi-algebraic constraints which play an important role since they characterize distributions with biological and probabilistic meaning. One of the main motivations for this thesis follows from the following question. Could the use of semi-algebraic tools improve the already existent algebraic tools for phylogenetic reconstruction?To answer this question, we compute the Euclidean distance of data points arising from an alignment of nucleotide to the phylogenetic varieties and their stochastic regions in a some scenarios of special interest in phylogenetics, such as trees with short external branches and/or subject to the long branch attraction phenomenon. In some cases, we compute these distances analytically and we can decide which tree has stochastic region closer to the data point. As a consequence, we can prove that, even if the data point was close to the phylogenetic variety of a given tree, it might be closer to the stochastic region of another tree. In particular, considering the stochastic phylogenetic region seems to be fundamental to cope with the phylogenetic reconstruction problem when dealing with the long branch attraction phenomenon.However, incorporating semi-algebraic tools into phylogenetic reconstruction methods can be extremely difficult and the procedure to do it is not evident at all. In this thesis, we present two phylogenetic reconstruction methods that combine algebraic and semi-algebraic conditions for the general Markov model. The first method we present is SAQ, which stands for Semi-Algebraic Quartet reconstruction method. Next, we introduce a more versatile method, ASAQ (for Algebraic and Semi-Algebraic Quartet reconstruction method}), which combines SAQ with the method Erik+2 (based on certain algebraic constraints). Both are phylogenetic reconstruction methods for DNA alignments on four taxa which have been proven to be statistically consistent.We test the suggested methods on simulated and real data to check their actual performance in several scenarios. Our simulation studies show that both methods SAQ and ASAQ are highly successful, even when applied to short alignments or data that violates their assumptions.


Fundación Dialnet

Mi Documat

Opciones de tesis

Opciones de compartir

Opciones de entorno