change Pandas dataframe column order in place

Their is no easy way to do this without making a copy. In theory it is possible to do if you ONLY have a single dtype (or are only changing columns WITHIN out the labels changing dtypes). But is fairly complicated, and hence is not implemented.

That said, if you are careful you can do this. You should ONLY do this with a single-dtyped frame (you are forewarned).

In [22]: df = DataFrame(np.random.randn(5,3),columns=list('ABC'))

In [23]: df
Out[23]: 
          A         B         C
0 -0.696593 -0.459067  1.935033
1  1.783658  0.612771  1.553773
2 -0.572515  0.634174  0.113974
3 -0.908203  1.454289  0.509968
4  0.776575  1.629816  1.630023

If df is multi-dtyped then df.values WILL NOT BE A VIEW (of course you can subselect out the single-dtyped frame which is a view itself). Another note, this is NOT ALWAYS POSSIBLE to have this come out as a view. It depends on what you are doing, YMMV.

e.g. df.values.take([2,0,1],axis=1) gives you the same result BUT IS A COPY.

In [24]: df2 = DataFrame(df.values[:,[2,0,1]],columns=list('ABC'))

In [25]: df2
Out[25]: 
          A         B         C
0  1.935033 -0.696593 -0.459067
1  1.553773  1.783658  0.612771
2  0.113974 -0.572515  0.634174
3  0.509968 -0.908203  1.454289
4  1.630023  0.776575  1.629816

We have a view on the original values

In [26]: df2.values.base
Out[26]: 
array([[ 1.93503267,  1.55377291,  0.1139739 ,  0.5099681 ,  1.63002264],
       [-0.69659276,  1.78365777, -0.5725148 , -0.90820288,  0.7765751 ],
       [-0.45906706,  0.61277136,  0.63417392,  1.45428912,  1.62981613]])

Note that if you then assign to df2 (another float column for instance), you will trigger a copy. So you have to be extremely careful with this.

That said the creation from a view of another frame takes almost no memory and is just a pointer, so very fast.


Hmm... no one proposed drop and insert:

df = pd.DataFrame([['a','b','c']],columns=list('ABC'))

print('Before', id(df))

for i,col in enumerate(['C','B', 'A']):
    tmp = df[col]
    df.drop(labels=[col],axis=1,inplace=True)
    df.insert(i,col,tmp)    
    
print('After ', id(df))
df.head()

The result will preserve the original dataframe

Before 140441780394360
After  140441780394360

   C    B   A
   ----------
0  c    b   a

Tags:

Python

Pandas