Hash Code and Checksum - what's the difference?

I would say that a checksum is necessarily a hashcode. However, not all hashcodes make good checksums.

A checksum has a special purpose --- it verifies or checks the integrity of data (some can go beyond that by allowing for error-correction). "Good" checksums are easy to compute, and can detect many types of data corruptions (e.g. one, two, three erroneous bits).

A hashcode simply describes a mathematical function that maps data to some value. When used as a means of indexing in data structures (e.g. a hash table), a low collision probability is desirable.


There is a different purpose behind each of them:

  • Hash code - designed to be random across its domain (to minimize collisions in hash tables and such). Cryptographic hash codes are also designed to be computationally infeasible to reverse.
  • Check sum - designed to detect the most common errors in the data and often to be fast to compute (for effective checksumming fast streams of data).

In practice, the same functions are often good for both purposes. In particular, a cryptographically strong hash code is a good checksum (it is almost impossible that a random error will break a strong hash function), if you can afford the computational cost.


There are indeed some differences:

  • Checksums just need to be different when the input is different (as often as possible), but it's almost as important that they're fast to compute.
  • Hash codes (for use in hashtables) have the same requirements, and additionally they should be evenly distributed across the code space, especially for inputs that are similar.
  • Cryptographic hashes have the much more stringent requirement that given a hash, you cannot construct an input that produces this hash. Computation times comes second, and depending on the applicatin it may even be desirable for the hash to be very slow to compute (in order to combat brute force attacks).