Project IV (MATH4072) 2021-22


Beyond mean regression

Dr J. Einbeck

Description

In public opinion, Statisticians are often regarded as "mean-lovers" (Friedman, 2002). This perception is certainly simplistic, but it has some truth: In the context, of, say, regression, focusing the attention on the mean has several advantages, such as computational simplicity or simple interpretation of estimated regression parameters.

However, the mean is only one of several measures of location, among them the median and the mode. For data which are highly skewed or which possess extreme outliers (such as income data), the mean can be a misleading measure of location. In contrast, the median is quite robust to such features, and regression based on the median, short L1 regression, is nowadays a well established technique. This can not yet be said about modal regression, which has, until very recently, been used only quite reluctantly for inferential purposes. One reason for this is that, for continuous data, there is no 'natural' estimator of the mode (unlike the sample mean or the sample median), hence requiring auxiliary algorithms to find it. Recently, interest in the mode has started to gain some traction, for instance for regression applications with highly skewed data, or possibly with multiple `regimes', where different local modes can describe different regimes.

However, one can also think further. Actually, one may be interested in the full response distribution, conditional on given predictor values. This leads to approaches which can be summarized as "distributional regression", with special cases being "generalized linear models for location, scale and shape" (gamlss), quantile regression, or "expectile regression" (expectiles being defined as the least squares analogue of quantiles). Such approaches, which are relatively new additions to the statistical toolbox, allow for rich analyses and powerful conclusions.

In this project, you will study methodologies of this type, investigate computational and theoretical aspects, and apply them on meaningful data examples which can be steered by your personal field of interest (for instance, economics, geography, biosciences).

Prerequisites

  • Statistical Methods III

Resources

email: jochen.einbeck "at" durham.ac.uk