Ir al contenido

Documat


Resumen de Statistical Analysis of Microarray Data

Geoffrey J. McLachlan

  • DNA microarray technology has made it possible to examine the expression of thousands of genes over multiple developmental stages or di®erent experi- mental conditions. Microarrays are now the most widely used tool in genomics to e±ciently measure an organism's gene expression levels. For example, clini- cal heterogeneity and progression of many diseases, such as cancer, re°ect the activity of di®erent genes that command biological processes at the cellular lev- el. The analysis of microarray gene expression data is thus a very important topic in many areas, including medical, veterinary, and agricultural ¯elds. Tra- ditional statistical methodology has to be re-evaluated and modi¯ed to carry out the main analyses required for microarray data. Such data consist of several thousands of genes measured over, say, tens or hundreds of tissues.

    In our ¯rst lecture, we consider an important and common question in mi- croarray experiments, namely the detection of genes that are di®erentially ex- pressed in tissue samples across a number of speci¯ed classes. These classes may correspond to tissues (cells) that are at di®erent stages in some process, in distinct pathological states, or under di®erent experimental conditions. As this problem concerns the selection of signi¯cant genes from a large pool of candidate genes, it is usually carried out within the framework of multiple hy- pothesis testing. A plethora of methods to detect di®erential gene expression have been proposed. In our presentation, we will concentrate on a rather simple approach based on mixture models, which allow the estimation of the implied false discovery rate and other rates such as the sensitivity.

    In the second lecture, we consider another important problem with the anal- ysis of microarray data. It concerns the grouping of the genes over several tissues in multiple development stages or di®erent experimental conditions such as in time-course experiments. A general goal common to many of these experiments is to characterize temporal patterns of gene expression within various biolog- ical conditions and to group genes by these patterns. These groups have the potential to provide insight into the biological function of genes.

    In the third lecture, attention is centred on the classi¯cation of the tissue samples on the basis of the genes in both an unsupervised and a supervised context. In the former context, an aim might be cluster the tissues on the basis of the genes in order to investigate the existence of classes and subclasses of various forms of cancer. In the latter context, an aim might be to develop a prediction rule to be used as a guide to therapeutic interventions for treating diseases of patients on the basis of their gene-expression signature. Unfortunately, most standard cluster and discriminant techniques cannot be applied directly as the number of observations (tissues) is small compared relative to the number of available variables (genes). We present a mixture model-based approach to the unsupervised problem based on extensions of the EMMIX program. For the supervised problem, we adopt a support vector machine as our classi¯er and we demonstrate the care that needs to be taken to avoid misleading biases in estimating the associated error rates for a classi¯er based on a reduced set of genes.


Fundación Dialnet

Mi Documat