Statistical Inference for the Evolutionary History of Cancer Genomes

Khanh N. Dinh; Roman Jaksik; Marek Kimmel; Amaury Lambert; Simon Tavaré

Ayuda

Statistical Inference for the Evolutionary History of Cancer Genomes

Dinh, Khanh N. ^[1] ; Roman Jaksik ^[2] ; Marek Kimmel ^[3] ; Amaury Lambert ^[4] ; Simon Tavaré ^[1]
1. [1] Columbia University
  
  Columbia University
  
  Estados Unidos
2. [2] Silesian University of Technology
  
  Silesian University of Technology
  
  Gliwice, Polonia
3. [3] Rice University
  
  Rice University
  
  Estados Unidos
4. [4] Pierre and Marie Curie University
  
  Pierre and Marie Curie University
  
  París, Francia
Mostrar afiliaciones +
Localización: Statistical science, ISSN 0883-4237, Vol. 35, Nº. 1, 2020 (Ejemplar dedicado a: Statistics and Science), págs. 129-144
Idioma: inglés
DOI: 10.1214/19-sts7561
Texto completo no disponible (Saber más ...)
Resumen
- Recent years have seen considerable work on inference about cancer evolution from mutations identified in cancer samples. Much of the modeling work has been based on classical models of population genetics, generalized to accommodate time-varying cell population size. Reverse-time, genealogical views of such models, commonly known as coalescents, have been used to infer aspects of the past of growing populations. Another approach is to use branching processes, the simplest scenario being the classical linear birth-death process. Inference from evolutionary models of DNA often exploits summary statistics of the sequence data, a common one being the so-called Site Frequency Spectrum (SFS). In a bulk tumor sequencing experiment, we can estimate for each site at which a novel somatic point mutation has arisen, the proportion of cells that carry that mutation. These numbers are then grouped into collections of sites which have similar mutant fractions.
  
  We examine how the SFS based on birth-death processes differs from those based on the coalescent model. This may stem from the different sampling mechanisms in the two approaches. However, we also show that despite this, they are quantitatively comparable for the range of parameters typical for tumor cell populations. We also present a model of tumor evolution with selective sweeps, and demonstrate how it may help in understanding the history of a tumor as well as the influence of data pre-processing. We illustrate the theory with applications to several examples from The Cancer Genome Atlas tumors.