Sorting Multi-Index to full depth (Pandas)

Pandas provides:

d = d.sort_index()
print d.index.is_lexsorted() # Sometimes true

which will do what you want in most cases. However, always sort the index, but may be leave it 'lexsorted' (for example, if you have NANs in the index), which generates a PerformanceWarning.

To avoid this:

d = d.sort_index(level=d.index.names)
print d.index.is_lexsorted() #  true

... though why there's a difference doesn't seem to be documented.


I realize some time has passed but I seem to have had the same problem as @idoda did, with the accepted answer not working on MultiIndex dataframes when the dataframes may have multiple indexes on both the columns and index. The trick, not currently shown here, is that there is an "axis" option which defaults to zero but can also be set to 1.

For example if you try:

df.sortlevel(inplace=True,sort_remaining=True)

And are still getting lexsort errors it may be relevant to know that their is a default "axis=0" kwarg in there. Thus you can also try adding

df.sortlevel(axis=1,inplace=True,sort_remaining=True)

Which should sort the other direction. If you don't want to think about it, you can just brute force it with:

df.sortlevel(axis=0,inplace=True,sort_remaining=True)
df.sortlevel(axis=1,inplace=True,sort_remaining=True)

That should fully sort both columns and row indexes at all levels. I had the same problem here and couldn't get a full lexsort with the suggested answer but a bit of research showed that even with "sort_remaining" True the sortlevel applies to only a single axis. These snippets are the solution to that which appear to be the current pythonic native answer. Hope somebody finds it helpful!


Its not really clear what you are asking. Multi-index docs are here

The OP needs to set the index, then sort in place

df.set_index(['fileName','phrase'],inplace=True)
df.sortlevel(inplace=True)

Then access these levels via a tuple to get a specific result

df.ix[('somePath','somePhrase')]

Maybe just give a toy example like this and show I want to get a specific result.

In [1]: arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'])
   ...:    .....: ,
   ...:    .....:           np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])
   ...:    .....:           ]

In [2]: df = DataFrame(randn(8, 4), index=arrays)

In [3]: df
Out[3]: 
                0         1         2         3
bar one  1.654436  0.184326 -2.337694  0.625120
    two  0.308995  1.219156 -0.906315  1.555925
baz one -0.180826 -1.951569  1.617950 -1.401658
    two  0.399151 -1.305852  1.530370 -0.132802
foo one  1.097562  0.097126  0.387418  0.106769
    two  0.465681  0.270120 -0.387639 -0.142705
qux one -0.656487 -0.154881  0.495044 -1.380583
    two  0.274045 -0.070566  1.274355  1.172247

In [4]: df.index.lexsort_depth
Out[4]: 2

In [5]: df.ix[('foo','one')]
Out[5]: 
0    1.097562
1    0.097126
2    0.387418
3    0.106769
Name: (foo, one), dtype: float64

In [6]: df.ix['foo']
Out[6]: 
            0         1         2         3
one  1.097562  0.097126  0.387418  0.106769
two  0.465681  0.270120 -0.387639 -0.142705

In [7]: df.ix[['foo']]
Out[7]: 
                0         1         2         3
foo one  1.097562  0.097126  0.387418  0.106769
    two  0.465681  0.270120 -0.387639 -0.142705

In [8]: df.sortlevel(level=1)
Out[8]: 
                0         1         2         3
bar one  1.654436  0.184326 -2.337694  0.625120
baz one -0.180826 -1.951569  1.617950 -1.401658
foo one  1.097562  0.097126  0.387418  0.106769
qux one -0.656487 -0.154881  0.495044 -1.380583
bar two  0.308995  1.219156 -0.906315  1.555925
baz two  0.399151 -1.305852  1.530370 -0.132802
foo two  0.465681  0.270120 -0.387639 -0.142705
qux two  0.274045 -0.070566  1.274355  1.172247

In [10]: df.sortlevel(level=1).index.lexsort_depth
Out[10]: 0

Tags:

Python

Pandas