Python: Scaling numbers column by column with pandas

You could subtract by the min, then divide by the max (beware 0/0). Note that after subtracting the min, the new max is the original max - min.

In [11]: df
Out[11]:
    a    b
A  14  103
B  90  107
C  90  110
D  96  114
E  91  114

In [12]: df -= df.min()  # equivalent to df = df - df.min()

In [13]: df /= df.max()  # equivalent to df = df / df.max()

In [14]: df
Out[14]:
          a         b
A  0.000000  0.000000
B  0.926829  0.363636
C  0.926829  0.636364
D  1.000000  1.000000
E  0.939024  1.000000

To switch the order of a column (from 1 to 0 rather than 0 to 1):

In [15]: df['b'] = 1 - df['b']

An alternative method is to negate the b columns first (df['b'] = -df['b']).


In case you want to scale only one column in the dataframe, you can do the following:

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()
df['Col1_scaled'] = scaler.fit_transform(df['Col1'].values.reshape(-1,1))

This is how you can do it using sklearn and the preprocessing module. Sci-Kit Learn has many pre-processing functions for scaling and centering data.

In [0]: from sklearn.preprocessing import MinMaxScaler

In [1]: df = pd.DataFrame({'A':[14,90,90,96,91],
                           'B':[103,107,110,114,114]}).astype(float)

In [2]: df
Out[2]:
    A    B
0  14  103
1  90  107
2  90  110
3  96  114
4  91  114

In [3]: scaler = MinMaxScaler()

In [4]: df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)

In [5]: df_scaled
Out[5]:
          A         B
0  0.000000  0.000000
1  0.926829  0.363636
2  0.926829  0.636364
3  1.000000  1.000000
4  0.939024  1.000000

Tags:

Python

Pandas