Comparing two pandas dataframes for differences

Not sure if this existed at the time the question was posted, but pandas now has a built-in function to test equality between two dataframes: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.equals.html.


Not sure if this is helpful or not, but I whipped together this quick python method for returning just the differences between two dataframes that both have the same columns and shape.

def get_different_rows(source_df, new_df):
    """Returns just the rows from the new dataframe that differ from the source dataframe"""
    merged_df = source_df.merge(new_df, indicator=True, how='outer')
    changed_rows_df = merged_df[merged_df['_merge'] == 'right_only']
    return changed_rows_df.drop('_merge', axis=1)

You also need to be careful to create a copy of the DataFrame, otherwise the csvdata_old will be updated with csvdata (since it points to the same object):

csvdata_old = csvdata.copy()

To check whether they are equal, you can use assert_frame_equal as in this answer:

from pandas.util.testing import assert_frame_equal
assert_frame_equal(csvdata, csvdata_old)

You can wrap this in a function with something like:

try:
    assert_frame_equal(csvdata, csvdata_old)
    return True
except:  # appeantly AssertionError doesn't catch all
    return False

There was discussion of a better way...