What preprocessing.scale() do? How does it work?

Scaling the data brings all your values onto one scale eliminating the sparsity and it follows the same concept of Normalization and Standardization. To see the effect, you can call describe on the dataframe before and after processing:

df.describe()

#with X is already pre-proccessed 
df2 = pandas.DataFrame(X)
df2.describe()

You will see df2 has 0 mean and the standard variation of 1 in each field.


The preprocessing.scale() algorithm puts your data on one scale. This is helpful with largely sparse datasets. In simple words, your data is vastly spread out. For example the values of X maybe like so:

X = [1, 4, 400, 10000, 100000]

The issue with sparsity is that it very biased or in statistical terms skewed. So, therefore, scaling the data brings all your values onto one scale eliminating the sparsity. In regards to know how it works in mathematical detail, this follows the same concept of Normalization and Standardization. You can do research on those to find out how it works in detail. But to make life simpler the sklearn algorithm does everything for you !