Project_III_Perrakis.knit

Penalised Regression Methods

Dr. Dinos Perrakis

Description

Modern regression applications in many fields are often characterized by one or more of the following; (1) highly-correlated predictors (multi-collinearity problem), (2) a large number of predictors (large-\(p\) problem), and (3) smaller sample size than number of predictors (\(n<p\) problem). In these cases least-squares (LS) estimation is either inefficient or simply infeasible; for instance, given a predictor matrix \(\mathbf{X}\) the inversion of \(\mathbf{X}^T\mathbf{X}\) (which is required for the LS solution) cannot be performed when \(n<p\).

In such settings we require regularisation of the problem at hand. This is achieved through the imposition of constraints on the regression coefficient vector \(\boldsymbol{\beta}\), which effectively shrink certain coefficients towards zero. This approach is also known as penalisation and essentially entails restricting a certain \(L_q\)-norm (\(\Vert\cdot\Vert_q\)) of vector \(\boldsymbol{\beta}\) not to exceed a specific threshold. Thus, in linear regression penalized solutions generally take the following form \(\begin{equation*} \underset{\boldsymbol{\beta}\in\mathbb{R}^p}{\mathop{\mathrm{arg\,min}}}\Vert \mathbf{y}- \mathbf{X}\boldsymbol{\beta}\Vert_2^2 ~~~ \text{subject to}~~~ \Vert\boldsymbol{\beta}\Vert_q^\gamma \le t ~~(t>0), \end{equation*}\) or equivalently the Lagrangian form \(\begin{equation*} \underset{\boldsymbol{\beta}\in\mathbb{R}^p}{\mathop{\mathrm{arg\,min}}}\Vert \mathbf{y}- \mathbf{X}\boldsymbol{\beta}\Vert_2^2 + \lambda\Vert\boldsymbol{\beta}\Vert_q^\gamma ~~(\lambda>0), \end{equation*}\)

where threshold \(t\) has a one-to-one correspondence to penalty parameter \(\lambda\) and typically \(\gamma\in(0,2]\).

Two prominent examples of penalised methods are based on the \(L_1\) and \(L_2\) norms. The first, ridge regression, utilizes the \(L_2\)-norm with \(\gamma = 2\) and is amongst the oldest penalised methods. The second, which is based on the \(L_1\) norm with \(\gamma = 1\), is called lasso regression. One important advantage of lasso is that it can set non-influential coefficients exactly equal to zero; thus, performing in a way also variable selection. Ever since the development of these two methods numerous extensions and variations have emerged. Nowadays, penalised regression is widely used in Statistics and Machine Learning.

The goal in this project will be to initially learn the basics around these methods, comprehend why they work well for statistically ill-posed problems and also understand how we set the penalty parameter in practice. Individual projects can then follow several directions; for example, comparing the methods in simulation studies, focusing deeper on the theory for a specific method, working on the Bayesian versions of penalised regression and/or using the methods in interesting real-word applications.

Prerequisites

Statistical Inference II and Statistical Modelling II.

Resources

Good introductory slides for ridge & lasso are available online here and here, respectively.
Original lasso paper: Tibshirani, R., (1996). “Regression shrinkage and selection via the lasso”. Journal of the Royal Statistical Society Series B, 58, 267–88.
Books:
- Chapter 1 in: The Elements of Statistical Learning, 2nd edition. Available online by the authors here.
- Chapters 1 & 3 in: Statistical Learning with Sparsity. Available online by the authors here.

Feel free to email at konstantinos.perrakis@durham.ac.uk if you have questions.

Project III 2023-24