Statistical methods for the analysis of copy number alterations in the genome

Oscar Manuel Rueda Palacio

Ayuda

Statistical methods for the analysis of copy number alterations in the genome

Autores: Oscar Manuel Rueda Palacio
Directores de la Tesis: Cristina Rueda Sabater (dir. tes.) , Ramón Díaz Uriarte (dir. tes.)
Lectura: En la Universidad de Valladolid ( España ) en 2008
Idioma: inglés
Tribunal Calificador de la Tesis: Bonifacio Salvador González (presid.) , Eustasio del Barrio (secret.) , Juan Francisco Poyatos (voc.) , Virgilio Gomez Ruiz (voc.) , Ana María Rojas Mendoza (voc.)
Enlaces
- Tesis en acceso abierto en: TESEO
Resumen
- Genomic DNA copy number alterations (CNAs) are associated with complex diseases, including cancer: CNAs are indeed related to tumoral grade, metastasis, and patient survival. CNAs discovered from array-based Comparative Genomic Hybridization (aCGH) data have been instrumental for identifying disease-related genes and potential therapeutic targets. To be immediately useful in both clinical and basic research scenarios, aCGH data analysis requires accurate methods that do not impose unrealistic biological assumptions and that provide direct answers to the key question "What is the probability that this gene/region has CNAs?". Recent studies have shown that these phenomena are common in the population, leading to the term \copy number variation". Thus a second problem is to distinguish between individual copy number variation and copy number changes related to disease.
  
  We have developed a statistical model and algorithms based on biological principles to approach these problems. It is a non-homogeneous Hidden Markov Model with an unknown number of hidden states and fitted via Reversible Jump Markov Chain Monte Carlo. With this formulation we can incorporate explicitly the distance between genes/probes and employ Bayesian Model Averaging, thus incorporating model uncertainty and not conditioning our inferences to the selection of a particular model. The model can be extended to include random eects to incorporate heterogeneity among dierent individuals.
  
  We present also two algorithms to find common regions of alteration. One of them is oriented to detect regions common to a set of samples with an overall probability of copy number alteration as high as a given threshold and the other identifies subsets of individuals that share regions with a probability of alteration as high as a given threshold.
  
  We show, using simulated and real data sets, that our method outperforms alternative ones, and compare the results of our algorithms to others found in the literature on well-known data sets with very satisfactory results.