Number of rows changes even after `pandas.merge` with `left` option

This sounds like having more than one rows in right under 'name2' that match the key you have set for the left. Using option 'how='left' with pandas.DataFrame.merge() only means that:

  • left: use only keys from left frame

However, the actual number of rows in the result object is not necessarily going to be the same as the number of rows in the left object.

Example:

In [359]: df_1
Out[359]: 
   A    B
0  a  AAA
1  b  BBA
2  c  CCF

and then another DF that looks like this (notice that there are more than one entry for your desired key on the left):

In [360]: df_3
Out[360]: 
  key  value
0   a      1
1   a      2
2   b      3
3   a      4

If I merge these two on left.A, here's what happens:

In [361]: df_1.merge(df_3, how='left', left_on='A', right_on='key')
Out[361]: 
   A    B  key  value
0  a  AAA    a    1.0
1  a  AAA    a    2.0
2  a  AAA    a    4.0
3  b  BBA    b    3.0
4  c  CCF  NaN    NaN

This happened even though I merged with how='left' as you can see above, there were simply more than one rows to merge and as shown here the result pd.DataFrame has in fact more rows than the pd.DataFrame on the left.

I hope this helps!


The problem of doubling of rows after each merge() (of any type, 'both' or 'left') is usually caused by duplicates in any of the keys, so we need to drop them first:

left_df.drop_duplicates(subset=left_key, inplace=True)
right_df.drop_duplicates(subset=right_key, inplace=True)

Tags:

Python

Pandas