Which cryptographic hash function should I choose?

All hash functions are "broken"

The pigeonhole principle says that try as hard as you will you can not fit more than 2 pigeons in 2 holes (unless you cut the pigeons up). Similarly you can not fit 2^128 + 1 numbers in 2^128 slots. All hash functions result in a hash of finite size, this means that you can always find a collision if you search through "finite size" + 1 sequences. It's just not feasible to do so. Not for MD5 and not for Skein.

MD5/SHA1/Sha2xx have no chance collisions

All the hash functions have collisions, its a fact of life. Coming across these collisions by accident is the equivalent of winning the intergalactic lottery. That is to say, no one wins the intergalactic lottery, its just not the way the lottery works. You will not come across an accidental MD5/SHA1/SHA2XXX hash, EVER. Every word in every dictionary, in every language, hashes to a different value. Every path name, on every machine in the entire planet has a different MD5/SHA1/SHA2XXX hash. How do I know that, you may ask. Well, as I said before, no one wins the intergalactic lottery, ever.

But ... MD5 is broken

Sometimes the fact that its broken does not matter.

As it stands there are no known pre-image or second pre-image attacks on MD5.

So what is so broken about MD5, you may ask? It is possible for a third party to generate 2 messages, one of which is EVIL and another of which is GOOD that both hash to the same value. (Collision attack)

Nonetheless, the current RSA recommendation is not to use MD5 if you need pre-image resistance. People tend to err on the side of caution when it comes to security algorithms.

So what hash function should I use in .NET?

  • Use MD5 if you need the speed/size and don't care about birthday attacks or pre-image attacks.

Repeat this after me, there are no chance MD5 collisions, malicious collisions can be carefully engineered. Even though there are no known pre-image attacks to date on MD5 the line from the security experts is that MD5 should not be used where you need to defend against pre-image attacks. SAME goes for SHA1.

Keep in mind, not all algorithms need to defend against pre-image or collision attacks. Take the trivial case of a first pass search for duplicate files on your HD.

  • Use SHA2XX based function if you want a cryptographically secure hash function.

No one ever found any SHA512 collision. EVER. They have tried really hard. For that matter no one ever found any SHA256 or 384 collision ever. .

  • Don't use SHA1 or RIPEMD unless its for an interoperability scenario.

RIPMED has not received the same amount of scrutiny that SHAX and MD5 has received. Both SHA1 and RIPEMD are vulnerable to birthday attacks. They are both slower than MD5 on .NET and come in the awkward 20 byte size. Its pointless to use these functions, forget about them.

SHA1 collision attacks are down to 2^52, its not going to be too long until SHA1 collisions are out in the wild.

For up to date information about the various hash functions have a look at the hash function zoo.

But wait there is more

Having a fast hash function can be a curse. For example: a very common usage for hash functions is password storage. Essentially, you calculate hash of a password combined with a known random string (to impede rainbow attacks) and store that hash in the database.

The problem is, that if an attacker gets a dump of the database, he can, quite effectively guess passwords using brute-force. Every combination he tries only takes a fraction of millisecond, and he can try out hundreds of thousands of passwords a second.

To work around this issue, the bcrypt algorithm can be used, it is designed to be slow so the attacker will be heavily slowed down if attacking a system using bcrypt. Recently scrypt has made some headline and is considered by some to be more effective than bcrypt but I do not know of a .Net implementation.


In cryptography, hash functions provide three separate functions.

  1. Collision resistance: How hard is it for someone to find two messages (any two messages) that hash the same.
  2. Preimage Resistance: Given a hash, how hard is it to find another message that hashes the same? Also known as a one way hash function.
  3. Second preimage resistance: Given a message, find another message that hashes the same.

These properties are related but independent. For example, collision resistance implies second preimage resistance, but not the other way around. For any given application, you will have different requirements, needing one or more of these properties. A hash function for securing passwords on a server will usually only require preimage resistance, while message digests require all three.

It has been shown that MD5 is not collision resistant, however, that does not preclude its use in applications that do not require collision resistance. Indeed, MD5 is often still used in applications where the smaller key size and speed are beneficial. That said, due to its flaws, researchers recommend the use of other hash functions in new scenarios.

SHA1 has a flaw that allows collisions to be found in theoretically far less than the 2^80 steps a secure hash function of its length would require. The attack is continually being revised and currently can be done in ~2^63 steps - just barely within the current realm of computability (as of April, 2009). For this reason NIST is phasing out the use of SHA1, stating that the SHA2 family should be used after 2010.

SHA2 is a new family of hash functions created following SHA1. Currently there are no known attacks against SHA2 functions. SHA256, 384 and 512 are all part of the SHA2 family, just using different key lengths.

RIPEMD I can't comment too much on, except to note that it isn't as commonly used as the SHA families, and so has not been scrutinized as closely by cryptographic researchers. For that reason alone I would recommend the use of SHA functions over it. In the implementation you are using it seems quite slow as well, which makes it less useful.

In conclusion, there is no one best function - it all depends on what you need it for. Be mindful of the flaws with each and you will be best able to choose the right hash function for your scenario.


⚠️ WARNING

August, 2022

DO NOT USE SHA-1 OR MD5 FOR CRYPTOGRAPHIC APPLICATIONS. Both of these algorithms are broken (MD5 can be cracked in 30 seconds by a cell phone).



Update:

Times have changed, we have a SHA3 winner. I would recommend using keccak (aka SHA3) winner of the SHA3 contest.

Original Answer:

In order of weakest to strongest I would say:

  1. RIPEMD BROKEN, Should never be used as can be seen in this pdf
  2. MD-5 BROKEN, Should never be used, can be broken in 2 minutes with a laptop
  3. SHA-1 BROKEN, Should never be used, is broken in principal, attacks are getting better by the week
  4. SHA-2 WEAK, Will probably be broken in the next few years. A few weaknesses have been found. Note that generally the higher key size, the harder the hash function is to break. While key size = strength is not always true, it is mostly true. So SHA-256 is probably weaker than SHA-512.
  5. Skein NO KNOWN WEAKNESSES, is a candidate for SHA-3. It is fairly new and thus untested. It has been implemented in a bunch of languages.
  6. MD6 NO KNOWN WEAKNESSES, is another a candidate for SHA-3. Probably stronger than Skien, but slower on single core machines. Like Skien it is untested. Some security minded developers are using it, in mission critical roles.

Personally I'd use MD6, because one can never been too paranoid. If speed is a real concern I'd look at Skein, or SHA-256.