DescriptionIn public opinion, Statisticians are often regarded as "mean-lovers" (Friedman, 2002). This perception is certainly simplistic, but it has some truth: In the context, of, say, regression, focusing the attention on the mean has several advantages, such as computational simplicity or simple interpretation of estimated regression parameters. However, the mean is only one of several measures of location, among them the median and the mode. For data which are highly skewed or which possess extreme outliers (such as income data), the mean can be a misleading measure of location. In contrast, the median is quite robust to such features, and regression based on the median, short L1 regression, is nowadays a well established technique. Median regression techniques have also been extended to quantile regression, hence allowing characterization of the full response distribution.In contrast, the mode has, until very recently, been used only quite reluctantly for inferential purposes. One reason for this is the fact that, for continuous data, there is no 'natural' estimator of the mode (unlike the sample mean or the sample median), hence requiring auxiliary algorithms to find it. However, recently interest in the mode has started to gain some traction, partly based on the fact that the mode is even more robust to outliers than the mean or the median. This is for instance of relevance for regression applications with highly skewed data, or possibly with multiple `regimes', where different local modes can describe different regimes. The mode also turns out to play a useful role in topics related to machine learning; for instance modal clustering has gained considerable attention in the computer vision literature. The mode is also of importance in Bayesian Statistics, for instance the frequentist `Lasso' estimate (a least squares regression technique with a L1 penalty enforcing sparsity of solutions) turns out to be the posterior mode of a Bayesian linear regression under appropriate prior specifications. Based on similar considerations, Chacón (2018) proclaimed in a recent paper the `Modal age of Statistics'. This project will not be able to answer the question of whether this conjecture is right or not -- only history will tell! But, in this project we will look at modern statistical developments which make use of the mode as the basic inferential ingredient. Within the range of modal approaches, there are multiple directions that can be taken, as indicated above, including regression, clustering, density estimation, segmentation, or aspects of Bayesian methodology. You will study in detail the concepts behind one or more of these directions, produce own implementations where possible and adequate, and apply these on practical problems of your choice.
Prerequisites
Resources
|
email: jochen.einbeck "at" durham.ac.uk