Checksum vs. Hash: Differences and Similarities?

A checksum (such as CRC32) is to prevent accidental changes. If one byte changes, the checksum changes. The checksum is not safe to protect against malicious changes: it is pretty easy to create a file with a particular checksum.

A hash function maps some data to other data. It is often used to speed up comparisons or create a hash table. Not all hash functions are secure and the hash does not necessarily changes when the data changes.

A cryptographic hash function (such as SHA1) is a checksum that is secure against malicious changes. It is pretty hard to create a file with a specific cryptographic hash.

To make things more complicated, cryptographic hash functions are sometimes simply referred to as hash functions.

What are similarities and differences between a "checksum" algorithm and a "hash" function?

A checksum is used to determine if something is the same.

If you have download a file, you can never be sure if it got corrupted on the way to your machine. You can use cksum to calculate a checksum (based on CRC-32) of the copy you now have and can then compare it to the checksum the file should have. This is how you check for file integrity.

A hash function is used to map one date to another date of fixed size. A perfect hash function is injective, so there are no collisions. Every input has one fixed output.

A cryptographic hash function is used for verification. With a cryptographic hash function you should to not be able to compute the original input.

A very common use case is password hashing. This allows the verification of a password without having to save the password itself. A service provider only saves a hash of a password and is not able to compute the original password. If the database of password hashes gets compromised, an attacker should not be able to compute these passwords as well. This is not the case, because there are strong and weak algorithms for password hashing. You can find more on that on this very site.

TL;DR:

Checksums are used to compare two pieces of information to check if two parties have exactly the same thing.

Hashes are used (in cryptography) to verify something, but this time, deliberately only one party has access to the date that has to be verified, while the other party only has access to the hash.

They are basically the same thing, but checksums tend to be smaller (a few bytes).

Integrity

Both hash functions and checksums are used to verify the integrity of data. Cryptographic hash functions are hash functions for which a collision is unknown. This is why cryptographic hash functions are used to construct things like a MAC (see below).

Information loss

Another property of hash functions and checksums is that information gets lost during computation. This must be true if you convert some data to a checksum/hash with less bits. This is also why you can't go back to the original data with just a checksum or a hash.

HMAC

What I think you are looking for is a MAC (Message Authentication Code). Such a code is used to detect the tampering of data. Most of the time it's just a combination of a hash function and some secret value, like a password. See also:

https://en.wikipedia.org/wiki/Message_authentication_code

Passwords

Passwords are sometimes stored as a hash. To verify the password, a hash is calculated of the password you enter, and it is compared to the stored password hash. Checksums are not used for such things because they are generally shorter and more prone to collisions, meaning that you can try random passwords and have a chance that your input has the same checksum as the original password.

But note that using normal (digest) hash functions is not the right way to store passwords. Because they are created for quickly digesting data, attackers can crack those hashes at high speeds. Programmers should use a hash function designed for storing passwords, like bcrypt or Argon2.

Edit: examples of algorithms

To answer your final question about specific algorithms: Please have a look at the Wikipedia page that lists hash functions. Like I mentioned above, they are basically the same. On Wikipedia, checksums are listed as a subset of hash functions.

https://en.wikipedia.org/wiki/List_of_hash_functions

Checksum vs. Hash: Differences and Similarities?

Tags:

Hash

Checksum

Related

Recent Posts