A short note on regularization

2 min readJul 6, 2018

In deep learning era, regularization one of the most commonly used methods for reducing ‘overfitting’. In this article I’d like to introduce regularization by building a step-by-step intuition. I assume that the readers are aware of basic terms used in machine learning (eg: training examples, predictors, training set, test set, overfitting, etc).

Let us concentrate on building a linear model on training set. This learning problem is generally ill-posed. In other words, if y is the dependent variable and x is (are) the independent variable(s), Ax = y does not have a unique solution (almost all practical problems fall in this category).

There are 2 possible cases that are encountered practically:

Number of equations >> number of variables which can be approximated using ordinary least square (OLS) regression. It is expected to generalize (work well for the test set).
Number of variables >> number of equations which leads to multiple solutions. OLS regression fit will be perfect for the training set, but will yield miserable results for the test set.

The second, which defines an over-determined system, can be considered as a primitive example of ‘overfitting’ (there are many other cases of overfitting): Imagine adding a new row of (X, y) (for testing only) which does not fit in the over-determined system of equations. Each feasible solution that was determined by the system will give a different predicted value for y, some of which will differ drastically from the actual value of y. This defeats the purpose of predictive modeling.

There are few ways of overcoming this issue (described in layman terms):

Decrease the number of predictors
Decrease the importance given to the variables in the predictive model (regularization)
…

A special note on step 2: Decreasing the importance given to variables pulls the model towards the most basic model, that is the null model: Y=average(Y) which discards all predictors. A more rigorous treatment of regularization involves use of Bayesian statistics. I will discuss it in another article.

A short note on regularization

Written by Naveen Mathew Nathan S.

No responses yet