Understanding the difference between Ridge and LASSO

I think it's more clear to first consider the optimization problem \begin{align} \text{minimize} &\quad \| w \|_1 \\ \text{subject to } &\quad \| y - X^T w \| \leq \delta. \end{align} The red curves in your picture are level curves of the function $\|y-X^T w\|$ and the outermost red curve is $\|y-X^Tx\|=\delta$. We imagine shrinking the $\ell_1$-norm ball (shown in blue) until it is as small as possible while still intersecting the constraint set. You can see in the 2D case it appears to be very likely that the point of intersection will be one of the corners of the $\ell_1$-norm ball. So the solution to this optimization problem is likely to be sparse.

We can use Lagrange multipliers to replace the constraint $\|y- X^Tw\| \leq \delta$ with a penalty term in the objective function. There exists a Lagrange multiplier $\lambda \geq 0$ such that any minimizer for the problem I wrote above is also a minimizer for the problem $$ \text{minimize} \quad \lambda (\| y-X^Tw\|-\delta) + \|w\|_1. $$ But this is equivalent to a standard Lasso problem. So, solutions to a Lasso problem are likely to be sparse.

Understanding the difference between Ridge and LASSO

Tags:

Machine Learning

Regression

Regularization

Related

Recent Posts