DescriptionData arising from several subpopulations often feature a clustered appearance, where the clusters may sometimes be well distinct, but often they are not well separated, or even overlapping. Data of this type pose considerable challenges to the data analyst. A possible way forward is to describe the data by a mixture model. For instance, if one believes that two latent subpopulations are available, one may describe the data as a mixture of two normal distributions, which in the univariate case takes the shape Y ~ p N(m1, σ12) +(1-p) N(m2, σ22) The parameters of this model can be estimated elegantly through the EM (Expectation-Maximization) algorithm, which over the past decades has become an extraordinarily important statistical device (with significance far beyond the mixture modelling problem). This model class is very versatile. The Gaussian distributions in the above model specifications could be replaced by bivariate normal distributions (see an example for a fitted bivariate Gaussian mixture in the image below); or any other distribution, or even by entire regression models. Mixture models are of major relevance in a wide range of sciences, including the environmental, social, and medical sciences as well as the finance sector. In this project, you will get some insight into the methodology used for mixture modelling, and then focus on a particular field of application of such models which suits your interests.
Prerequisites
Resources
|
email: jochen.einbeck "at" durham.ac.uk