How to generate a Hash or checksum value on Python Dataframe (created from a fixed width file)?

You can now use pd.util.hash_pandas_object

hashlib.sha1(pd.util.hash_pandas_object(df).values).hexdigest() 

For a dataframe with 50 million rows, this method took me 10 seconds versus over a minute for the to_json() method.


Use string representation dataframe.

import hashlib

print(hashlib.sha256(df1.to_json().encode()).hexdigest())
print(hashlib.sha256(df2.to_json().encode()).hexdigest())

or

print(hashlib.sha256(df1.to_csv().encode()).hexdigest())
print(hashlib.sha256(df2.to_csv().encode()).hexdigest())