Normalization VS. numpy way to normalize?

There are different types of normalization. You are using min-max normalization. The min-max normalization from scikit learn is as follows.

import numpy as np
from sklearn.preprocessing import minmax_scale

# your function
def normalize_list(list_normal):
    max_value = max(list_normal)
    min_value = min(list_normal)
    for i in range(len(list_normal)):
        list_normal[i] = (list_normal[i] - min_value) / (max_value - min_value)
    return list_normal

#Scikit learn version 
def normalize_list_numpy(list_numpy):
    normalized_list = minmax_scale(list_numpy)
    return normalized_list

test_array = [1, 2, 3, 4, 5, 6, 7, 8, 9]
test_array_numpy = np.array(test_array)

print(normalize_list(test_array))
print(normalize_list_numpy(test_array_numpy))

Output:

[0.0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0]    
[0.0, 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1.0]

MinMaxscaler uses exactly your formula for normalization/scaling: http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.minmax_scale.html

@OuuGiii: NOTE: It is not a good idea to use Python built-in function names as varibale names. list() is a Python builtin function so its use as a variable should be avoided.

The question/answer that you reference doesn't explicitly relate your own formula to the np.linalg.norm(list) version that you use here.

One NumPy solution would be this:

import numpy as np
def normalize(x):
    x = np.asarray(x)
    return (x - x.min()) / (np.ptp(x))

print(normalize(test_array))    
# [ 0.     0.125  0.25   0.375  0.5    0.625  0.75   0.875  1.   ]

Here np.ptp is peak-to-peak ie

Range of values (maximum - minimum) along an axis.

This approach scales the values to the interval [0, 1] as pointed out by @phg.

The more traditional definition of normalization would be to scale to a 0 mean and unit variance:

x = np.asarray(test_array)
res = (x - x.mean()) / x.std()
print(res.mean(), res.std())
# 0.0 1.0

Or use sklearn.preprocessing.normalize as a pre-canned function.

Using test_array / np.linalg.norm(test_array) creates a result that is of unit length; you'll see that np.linalg.norm(test_array / np.linalg.norm(test_array)) equals 1. So you're talking about two different fields here, one being statistics and the other being linear algebra.

The power of python is its broadcasting property, which allows you to do vectorizing array operations without explicit looping. So, You do not need to write a function using explicit for loop, which is slow and time-consuming, especially if your dataset is too big.

The pythonic way of doing min-max normalization is

test_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
normalized_test_array = (test_array - min(test_array)) / (max(test_array) - min(test_array))

output >> [ 0., 0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1. ]

Normalization VS. numpy way to normalize?

Tags:

Python

Numpy

Normalization

Related

Recent Posts