Resumen de Statistical Methods for the Modelling of Label-Free Shotgun Proteomic Data in Cell Line Biomarker Discovery

Ayuda

Resumen de Statistical Methods for the Modelling of Label-Free Shotgun Proteomic Data in Cell Line Biomarker Discovery

Josep Gregori Font

In this work it has been developed and implemented a data analysis pipeline for the discovery of biomarkers by high throughput shotgun proteomics. Specifically the solution has been optimized for the analysis of secretomes of tumor cell lines by label-free LC-MS/MS, with proteins quantified by peptide spectral counts. Along the development it has been shown the incidence and relevance of batch effects in the comparative analysis of label-free proteomics by LC-MS/MS. Also the features providing reproducibility to potential biomarkers have been identified. The model has been developed on empirical data obtained from a series of spiked experiments, and with the help of simulations to evaluate its performance. The pipeline comprises an exploratory data analysis (EDA) R/Bioconductor package, msmsEDA, based on multidimensional analysis tools and a R/Bioconductor inference package, msmsTests, based on generalized linear models (GLM) with Poisson or negative binomial distributions, or the quasi-likelihood GLM extension. Two graphical interfaces have also been produced to ease the use of the provided solution in a MS lab by non experts, and are freely available at GitHub. The designed model is devised to discover differentially expressed proteins in tumor cell line secretomes, using the cell as the unit of interest. The model allows blocking factors as a mean for batch effects correction. The normalization to cell units is embedded in the model through the use of offsets, and no previous data treatment is required. The two packages developed, msmsEDA and msmsTests, allow for: • Dataset quality assessment. • The identification of outliers • The identification of confounding factors or batch effects. • The discovery of potential biomarkers by using the distribution best fitting the available data. • The improvement of reproducibility by a post test filter based of effect size and signal levels. Different papers have been published in peer-reviewed proteomics journals develo-ping each data treatment step, and demonstrating its use and value in biological experiments carried out in the Tumor Biomarker lab at VHIO.

Acceso de usuarios registrados

¿Olvidó su contraseña?

¿Es nuevo? Regístrese

Ventajas de registrarse

Mi Documat

Selección

Coordinado por: