Project IV (MATH 4072) 2014-15


Bayesian Emulation and Inference for an Auto-regulatory Gene Network model

Ian Vernon

Description

Systems Biology is a rapidly expanding area which involves the investigation of chemical reaction networks found (usually) within cells. The vast majority of these networks have been studied using standard coupled differential equations. While these differential equation models are suitable if the numbers of molecules of all chemicals are large, such deterministic models break down for low numbers of molecules, a case which often occurs for processes that involve gene translation.

In these situations, the stochastic or random nature of the system becomes evident. Fortunately, stochastic versions of the chemical reaction network models can be successfully used to represent and understand the system. These models contain several rate parameters, one for each proposed reaction in the network. If observed data is available, the main problem of interest is how to learn about the rate parameters using both the model and the observations.

For realistic sizes of networks which may posses large numbers of rate or input parameters, we immediately encounter the following problem. As the networks are not that fast to simulate, we cannot comprehensively explore the high dimensional input parameter space, as this would require far too many simulations and would simply take too long (in some cases many thousands of years!). We solve this problem by introducing the incredibly powerful concept of a "statistical emulator": a statistical construct that models the relationship between the inputs (the rate parameters) and the outputs (the number of molecules) of the stochastic network model. Having represented the model in this way, we can now answer any questions we may have using the emulator instead of the model, for example we can use the emulator to perform inference to learn about the rate parameters given suitable observational data. Emulators are powerful tools that are currently being employed by members of the Mathematics department in the areas of Cosmology, Climate Change, Geology and Biology, to name but a few.

In this project the student will learn about a class of stochastic systems biology models and how to simulate from them using R. The student will then examine the prokaryotic auto-regulatory gene network in detail, and learn how to perform Bayesian inference given perfect data. The student will analyse the inference process and investigate the problems that arise when the observed data is not perfect. The student will then attempt to construct an emulator for certain outputs of the model, choosing a suitable subset of outputs. Emulator diagnostics will be performed and then the emulator will be used to learn about a subset of the input space, given observational data. This project has both theoretical and computational aspects, the later occurring through use of the statistical package R. While the balance between theory and computation can be somewhat weighted toward the student's preferred direction, some familiarity with R and with the methods described in 2H Statistical Concepts are essential, as are the concepts developed in 3H Statistical Methods.

Prerequisites

Statistical Concepts II and Statistical Methods III

Resources

The systems biology networks that feature in this project, along with many relevant statistical techniques, are clearly described in Prof Darren Wilkinson's excellent introductory text:

Stochastic Modelling for Systems Biology, Darren Wilkinson, published by Chapman & Hall/CRC, 2006.

For more details see the book's website , which gives a flavour of the subject. Also to be found on this site are several examples of R functions that can be used to simulate such stochastic models (along with a lot of useful R code).

An excellent web-site which describes (in sometimes overwhelming detail!) the types of analyses which this project gives an introduction to is:

The MUCM Web-site

This is the web-site for the Managing Uncertainty in Complex Models (MUCM) project, a consortium in which we are involved, (with the Universities of Sheffield, Aston, LSE and Southampton). There are an enormous number of links to follow at this site. One in particular, which gives an introduction to emulation, is:

O'Hagan, A. (2006). Bayesian analysis of computer code outputs: a tutorial. Reliability Engineering and System Safety 91, 1290–1300.

See also the MUCM toolkit for a detailed list of emulation related techniques and tools.

email: Ian Vernon


Back