Efficiently joining two dataframes based on multiple levels of a multiindex

This is not implemented internally ATM, but your soln is the recommended one, see here as well the issue

You can simply wrap this in a function if you want to make it look nicer. reset_index/set_index do copy (though you can pass an inplace=True argument if you want); it IS truly inplace as these are just changing the index attribute.

You could patch in a nice function like:

def merge_multi(self, df, on):
    return self.reset_index().join(df,on=on).set_index(self.index.names)
DataFrame.merge_multi = merge_multi

df1.merge_multi(df2,on=['Body','Season'])

However, merging by definition creates new data, so not sure how much this will actually save you.

A better method is to build up smaller frames, then do a larger merge. You also might want to do something like this


join now allows merging of MultiIndex DataFrames with partially matching indices.

Following your example:

df1 = df1.join(df2, on=['Body','Season'])

make sure the on columns are specified in exactly the order that match the Index of the other DataFrame as the on argument matches the order you specify the labels of the calling DataFrame with the Index as it is in the other DataFrame you join to.

or just join without using on and by default it will use the common index levels between the two DataFrames:

df1 = df1.join(df2)

Resulting df1:

                          A         B      Mood
Body Season Item                               
sun  summer one   -0.483779  0.981052      Good
     winter one   -0.309939  0.803862       Bad
            two   -0.413732  0.025331       Bad
moon summer one   -0.926068 -1.316808      Ugly
            two    0.221627 -0.226154      Ugly
            three  1.064856  0.402827      Ugly
     winter one    0.526461 -0.932231  Confused
            two   -0.296415 -0.812374  Confused