Pandas: Multilevel column names

Try this:

df=pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})

columns=[('c','a'),('c','b')]

df.columns=pd.MultiIndex.from_tuples(columns)

A lot of these solutions seem just a bit more complex than they need to be.

I prefer to make things look as simple and intuitive as possible when speed isn't absolutely necessary. I think this solution accomplishes that. Tested in versions of pandas as early as 0.22.0.

Simply create a DataFrame (ignore columns in the first step) and then set colums equal to your n-dim list of column names.

In [1]: import pandas as pd                                                                                                                                                                                          

In [2]: df = pd.DataFrame([[1, 1, 1, 1], [2, 2, 2, 2]])                                                                                                                                                              

In [3]: df                                                                                                                                                                                                           
Out[3]: 
   0  1  2  3
0  1  1  1  1
1  2  2  2  2

In [4]: df.columns = [['a', 'c', 'e', 'g'], ['b', 'd', 'f', 'h']]                                                                                                                                                    

In [5]: df                                                                                                                                                                                                           
Out[5]: 
   a  c  e  g
   b  d  f  h
0  1  1  1  1
1  2  2  2  2

You can use concat. Give it a dictionary of dataframes where the key is the new column level you want to add.

In [46]: d = {}

In [47]: d['first_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                         data=[[10, 0.89, 0.98, 0.31],
                                               [20, 0.34, 0.78, 0.34]]).set_index('idx')

In [48]: pd.concat(d, axis=1)
Out[48]:
    first_level
              a     b     c
idx
10         0.89  0.98  0.31
20         0.34  0.78  0.34

You can use the same technique to create multiple levels.

In [49]: d['second_level'] = pd.DataFrame(columns=['idx', 'a', 'b', 'c'],
                                          data=[[10, 0.29, 0.63, 0.99],
                                                [20, 0.23, 0.26, 0.98]]).set_index('idx')

In [50]: pd.concat(d, axis=1)
Out[50]:
    first_level             second_level
              a     b     c            a     b     c
idx
10         0.89  0.98  0.31         0.29  0.63  0.99
20         0.34  0.78  0.34         0.23  0.26  0.98

No need to create a list of tuples

Use: pd.MultiIndex.from_product(iterables)

import pandas as pd
import numpy as np

df = pd.Series(np.random.rand(3), index=["a","b","c"]).to_frame().T
df.columns = pd.MultiIndex.from_product([["new_label"], df.columns])

Resultant DataFrame:

  new_label                    
          a         b         c
0   0.25999  0.337535  0.333568

Pull request from Jan 25, 2014

Pandas: Multilevel column names

Tags:

Python

Pandas

Related

Recent Posts