Machine Learning

image
Konstantinos Perrakis

Shrinkage Methods: Lasso Regression

One disadvantage of ridge regression

image

Lasso Regression (Least Absolute Shrinkage and Selection Operator)

The advantage Lasso offers the same benefits as ridge:

At the same time: Lasso sets coefficients exactly equal to zero \(\implies\) feature selection.
Lasso delivers sparse models \(\rightarrow\) models involving subsets of predictors.
As with ridge proper tuning of the penalty parameter \(\lambda\) is critical.

Lasso paths for Credit data based on \(\lambda\)

image

Lasso paths for Credit data based on \(\lVert\hat{\beta}^R_{\lambda}\rVert_1/\lVert\hat{\beta}\rVert_1\)

image

Similarly for the “flipped” regularisation diagram based on the ratio of the \(\ell_1\) norms.

How does this “magic” happen?

Mathematically:

We will see a geometric interpretation of ridge and lasso which will make more sense.

Next topic

We will discuss ridge and lasso from the perspective of constrained optimisation which provides a useful insight about the geometry of their corresponding solutions.