DescriptionDNA microarray data are a rich source of information for molecular biologists, and a fascinating source of data for statisticians. Slightly simplified, microarrays are solid surfaces made of glass or silicon, which carry an arrayed series of thousands of microscopic DNA spots. DNA microarrays are used to measure changes in gene expression levels, where "expression" is the translation of information encoded in a gene into proteins. The analysis of such data has gained enormous importance in understanding various biological processes including cancer. Typically, one screens simultaneously the expression of all genes in a cell exposed to some specific conditions, yielding a table of "sample versus genes" of dimension, say n x p. One of the basic features of microarray data is that p >> n, where the number of observations, n, is usually in the region of tens and the number genes, p, in the region of thousands. This makes any attempt of direct statistical analysis (for instance, classification of genes) almost impossible, and one needs efficient preprocessing steps which extract low-dimensional ``features" from the data matrix.This project will look at ways of extracting information or ``features'' from high-dimensional data, using initially familiar statistical techniques such as principal component analysis, but proceeding to more advanced methods lateron. The project will be run in collaboration with Dr Adetayo Kasim from the Wolfson Research Institute, who has arranged for microarray data provided by Janssen Pharmaceutical, Beerse, Belgium. There are two directions of research in which to turn in the course of the project:
PrerequisitesStatistical Methods III
Resources and Examples
|
email: jochen.einbeck "at" durham.ac.uk