Linear Regression :: Normalization (Vs) Standardization

First question is why we need Normalisation/Standardisation?

=> We take a example of dataset where we have salary variable and age variable. Age can take range from 0 to 90 where salary can be from 25thousand to 2.5lakh.

We compare difference for 2 person then age difference will be in range of below 100 where salary difference will in range of thousands.

So if we don't want one variable to dominate other then we use either Normalisation or Standardization. Now both age and salary will be in same scale but when we use standardiztion or normalisation, we lose original values and it is transformed to some values. So loss of interpretation but extremely important when we want to draw inference from our data.

Normalization rescales the values into a range of [0,1]. also called min-max scaled.

Standardization rescales data to have a mean (μ) of 0 and standard deviation (σ) of 1.So it gives a normal graph.

enter image description here

Example below:

enter image description here

Another example:

enter image description here

In above image, you can see that our actual data(in green) is spread b/w 1 to 6, standardised data(in red) is spread around -1 to 3 whereas normalised data(in blue) is spread around 0 to 1.

Normally many algorithm required you to first standardise/normalise data before passing as parameter. Like in PCA, where we do dimension reduction by plotting our 3D data into 1D(say).Here we required standardisation.

But in Image processing, it is required to normalise pixels before processing. But during normalisation, we lose outliers(extreme datapoints-either too low or too high) which is slight disadvantage.

So it depends on our preference what we chose but standardisation is most recommended as it gives a normal curve.


Note that the results might not necessarily be so different. You might simply need different hyperparameters for the two options to give similar results.

The ideal thing is to test what works best for your problem. If you can't afford this for some reason, most algorithms will probably benefit from standardization more so than from normalization.

See here for some examples of when one should be preferred over the other:

For example, in clustering analyses, standardization may be especially crucial in order to compare similarities between features based on certain distance measures. Another prominent example is the Principal Component Analysis, where we usually prefer standardization over Min-Max scaling, since we are interested in the components that maximize the variance (depending on the question and if the PCA computes the components via the correlation matrix instead of the covariance matrix; but more about PCA in my previous article).

However, this doesn’t mean that Min-Max scaling is not useful at all! A popular application is image processing, where pixel intensities have to be normalized to fit within a certain range (i.e., 0 to 255 for the RGB color range). Also, typical neural network algorithm require data that on a 0-1 scale.

One disadvantage of normalization over standardization is that it loses some information in the data, especially about outliers.

Also on the linked page, there is this picture:

Plots of a standardized and normalized data set

As you can see, scaling clusters all the data very close together, which may not be what you want. It might cause algorithms such as gradient descent to take longer to converge to the same solution they would on a standardized data set, or it might even make it impossible.

"Normalizing variables" doesn't really make sense. The correct terminology is "normalizing / scaling the features". If you're going to normalize or scale one feature, you should do the same for the rest.


That makes sense because normalization and standardization do different things.

Normalization transforms your data into a range between 0 and 1

Standardization transforms your data such that the resulting distribution has a mean of 0 and a standard deviation of 1

Normalization/standardization are designed to achieve a similar goal, which is to create features that have similar ranges to each other. We want that so we can be sure we are capturing the true information in a feature, and that we dont over weigh a particular feature just because its values are much larger than other features.

If all of your features are within a similar range of each other then theres no real need to standardize/normalize. If, however, some features naturally take on values that are much larger/smaller than others then normalization/standardization is called for

If you're going to be normalizing at least one variable/feature, I would do the same thing to all of the others as well