Project IV: Graphical Models in Multivariate Statistics - Jonathan Cumming

Description

A graphical model is a particular type of multivariate statistical model that uses graphs (the mathematical object comprising nodes and edges) to represent the variables and relationships within the model. A graphical model encodes such problems by representing the variables as the nodes of the graph, and the relationships between pairs of variables as edges representing statements of conditional dependence. Underlying this graph structure is then a joint multivariate distribution, which provides the basis for statistical inference. Realistic problems in statistics are characterised by the presence of multiple variables and a complex web of relationships between them, which makes graphical models potentially very useful tools in the statistician's toolbox.

Graphical models come in a variety of flavours depending on whether we assign a direction to the edges of the graph. Directed graphs lend themselves to a Bayesian treatment and so are often known as Bayesian networks, whereas undirected graphical models, also known as Markov random fields, often follow a frequentist approach based on likelihood and parameter estimation. Continuous and discrete-valued data must also be handled differently using methods based on the multivariate Normal distribution and log-linear models respectively.



A simplified Bayesian network model for the diagnosis of scurvy and plague.

Graphical modelling is a versatile technique which has been used in a wide range of application domains, including: decision support, medical and fault diagnosis, web search, image analysis, speech recognition, natural language processing, decoding of messages sent over a noisy communication channel, and robot navigation.

In this project, we will begin by studying how multivariate statistical problems can be expressed as graphical models, investigate Graphical Gaussian models for multivariate Normally-distributed data, and apply this knowledge to the analysis of real data sets.

There are then many further topics which you could pursue depending your interest. Some examples would be:

This project has a focus on statistical methodology and data analysis. Familiarity with the statistical package R, general statistical concepts, and data analysis are essential.

Prerequisites

Statistical Concepts II, Statistical Methods III

Would be nice/might help, but not essential

Bayesian Statistics III/IV

Web Resources

Books

Many books contain suitable introductory material:

Email

Jonathan Cumming