python pandas select rows where two columns are (not) equal

Use the complement operator ~

hsp.loc[~(hsp['Type_old'] == hsp['Type_new'])]

which gives:

   id Type_old Type_new  Len_old  Len_new
1   2      Num     Char       12       12
2   3     Char      Num       10        8

When dealing with Boolean operations, the complement operator is a handy way to invert True with False


Ways to be confused by == versus != when comparing pd.Series

As expected

df[['Len_old', 'Len_new']].assign(NE=df.Len_old != df.Len_new)

   Len_old  Len_new     NE
0       15       15  False
1       12       12  False
2       10        8   True
3        4        5   True
4        9       10   True

But if one of the column's values were strings!

df[['Len_old', 'Len_new']].assign(NE=df.Len_old.astype(str) != df.Len_new)

   Len_old  Len_new    NE
0       15       15  True
1       12       12  True
2       10        8  True
3        4        5  True
4        9       10  True

Make sure both are the same types.


Your code, as piRSquared said, had an issue with types.

Besides that, you could use comparing methods, in this case pd.Series.ne

Using your data:

hsp.loc[hsp['Type_old'].ne(hsp['Type_new'])]

But again, as piRSquared mentioned, because of dtypes it didn't work. Just in case, you have to take care about NaN/None values at your data... such:

hsp.loc[ ( hsp['Type_old'].ne(hsp['Type_new']) ) && (hsp['Type_old'].notna())]

In this case, .ne has another argument, fill_value, which fill missing data.


In addition, you could use "compare" method to show difference between two series (or DataFrames)

hsp.Len_old.compare(hsp.Len_new)

And it might return (if columns were of the same dtype):

   self  other
2  10.0    8.0
3   4.0    5.0
4   9.0   10.0

But just force to have another dtype:

hsp.Len_old.compare(hsp.Len_new.astype('str')) # string type new column

It will return all rows:

   self other
0   15  15
1   12  12
2   10  8
3   4   5
4   9   10

Tags:

Python

Pandas