Choosing a hash function for best performance

I was in a similar situation where I needed a .NET hashing algorithm. I needed it for server response caching where speed was more important than security. When I got to this thread, I noticed speculation about performance differences in choice of algorithm and in 32-bit versus 64-bit execution. To bring some science into this debate, I have created some code to actually test some of the available algorithms. I decided to test the built-in MD5, SHA1, SHA256, and SHA512 algorithms. I also included a CRC32 implementation from force-net and a CRC64 implementation from DamienGKit. My results with a ~115MB file are as follows:

Running in 32-bit mode.

Warm-up phase:

CRC32: 296 MiB/s [9C54580A] in 390ms.

CRC64: 95 MiB/s [636BCF1455BC885A] in 1212ms.

MD5: 191 MiB/s [mm/JVFusWMKcT/P+IR4BjQ==] in 604ms.

SHA1: 165 MiB/s [WSFkGbnYte5EXb7kgp1kqbi2...] in 699ms.

SHA256: 93 MiB/s [USKMHQmfMil8/KL/ASyE6rm/...] in 1240ms.

SHA512: 47 MiB/s [Cp9cazN7WsydTPn+k4Xu359M...] in 2464ms.

Final run:

CRC32: 279 MiB/s [9C54580A] in 414ms.

CRC64: 96 MiB/s [636BCF1455BC885A] in 1203ms.

MD5: 197 MiB/s [mm/JVFusWMKcT/P+IR4BjQ==] in 588ms.

SHA1: 164 MiB/s [WSFkGbnYte5EXb7kgp1kqbi2...] in 707ms.

SHA256: 96 MiB/s [USKMHQmfMil8/KL/ASyE6rm/...] in 1200ms.

SHA512: 47 MiB/s [Cp9cazN7WsydTPn+k4Xu359M...] in 2441ms.


Running in 64-bit mode.

Warm-up phase:

CRC32: 310 MiB/s [9C54580A] in 373ms.

CRC64: 117 MiB/s [636BCF1455BC885A] in 986ms.

MD5: 198 MiB/s [mm/JVFusWMKcT/P+IR4BjQ==] in 584ms.

SHA1: 184 MiB/s [WSFkGbnYte5EXb7kgp1kqbi2...] in 627ms.

SHA256: 104 MiB/s [USKMHQmfMil8/KL/ASyE6rm/...] in 1112ms.

SHA512: 149 MiB/s [Cp9cazN7WsydTPn+k4Xu359M...] in 778ms.

Final run:

CRC32: 292 MiB/s [9C54580A] in 396ms.

CRC64: 119 MiB/s [636BCF1455BC885A] in 975ms.

MD5: 199 MiB/s [mm/JVFusWMKcT/P+IR4BjQ==] in 582ms.

SHA1: 192 MiB/s [WSFkGbnYte5EXb7kgp1kqbi2...] in 601ms.

SHA256: 106 MiB/s [USKMHQmfMil8/KL/ASyE6rm/...] in 1091ms.

SHA512: 157 MiB/s [Cp9cazN7WsydTPn+k4Xu359M...] in 738ms.

These results were obtained from a compiled Release-build ASP.NET project running .NET v4.5.2. Both the 32-bit and 64-bit results are from the same machine. In Visual Studio, I changed the mode via Tools > Options > Projects and Solutions > Web Projects > Use the 64 bit version of IIS Express, along with changing the Platform target of the project.

We can see that although the results fluctuate a bit run-to-run, CRC32 (by force-net) is the fastest, followed by Microsoft's MD5 and SHA1. Curiously, there is no performance benefit in choosing DamienGKit's CRC64 over the built-in MD5 or SHA1. 64-bit execution seems to help a lot with SHA512 but only modestly with the others.

To answer the OP's question, it would seem that the built-in MD5 or SHA1 may provide the best balance of collision-avoidance and performance.

My code is as follows:

Stopwatch timer = new Stopwatch();
Force.Crc32.Crc32Algorithm hasherCRC32 = new Force.Crc32.Crc32Algorithm();
System.Security.Cryptography.MD5Cng hasherMD5 = new System.Security.Cryptography.MD5Cng();
System.Security.Cryptography.SHA1Cng hasherSHA1 = new System.Security.Cryptography.SHA1Cng();
System.Security.Cryptography.SHA256Cng hasherSHA256 = new System.Security.Cryptography.SHA256Cng();
System.Security.Cryptography.SHA512Cng hasherSHA512 = new System.Security.Cryptography.SHA512Cng();
String result = "";
String rate = "";

Status.Text += "Running in " + ((IntPtr.Size == 8) ? "64" : "32") + "-bit mode.<br /><br />";

Status.Text += "Warm-up phase:<br />";

timer.Restart();
result = BitConverter.ToUInt32(hasherCRC32.ComputeHash(ImageUploader.FileBytes), 0).ToString("X8");
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "CRC32: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

timer.Restart();
result = DamienG.Security.Cryptography.Crc64Iso.Compute(ImageUploader.FileBytes).ToString("X16");
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "CRC64: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

timer.Restart();
result = Convert.ToBase64String(hasherMD5.ComputeHash(ImageUploader.FileBytes));
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "MD5: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

timer.Restart();
result = Convert.ToBase64String(hasherSHA1.ComputeHash(ImageUploader.FileBytes));
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "SHA1: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

timer.Restart();
result = Convert.ToBase64String(hasherSHA256.ComputeHash(ImageUploader.FileBytes));
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "SHA256: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

timer.Restart();
result = Convert.ToBase64String(hasherSHA512.ComputeHash(ImageUploader.FileBytes));
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "SHA512: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

Status.Text += "<br />Final run:<br />";

timer.Restart();
result = BitConverter.ToUInt32(hasherCRC32.ComputeHash(ImageUploader.FileBytes), 0).ToString("X8");
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "CRC32: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

timer.Restart();
result = DamienG.Security.Cryptography.Crc64Iso.Compute(ImageUploader.FileBytes).ToString("X16");
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "CRC64: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

timer.Restart();
result = Convert.ToBase64String(hasherMD5.ComputeHash(ImageUploader.FileBytes));
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "MD5: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

timer.Restart();
result = Convert.ToBase64String(hasherSHA1.ComputeHash(ImageUploader.FileBytes));
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "SHA1: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

timer.Restart();
result = Convert.ToBase64String(hasherSHA256.ComputeHash(ImageUploader.FileBytes));
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "SHA256: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

timer.Restart();
result = Convert.ToBase64String(hasherSHA512.ComputeHash(ImageUploader.FileBytes));
timer.Stop();
rate = ((double)ImageUploader.FileBytes.Length / timer.ElapsedMilliseconds / 1.024 / 1024).ToString("0");
Status.Text += "SHA512: " + rate + " MiB/s [" + result + "] in " + timer.ElapsedMilliseconds + "ms" + ".<br />";

It depends on the number of files you have.

The chance of a collision P(collision) = c/2^N (in a perfect hash function), where c is your number of messages (files) and N is the number of bits in your collision algorithm.

As real-world hash functions aren't perfect so you have two options: optimize for speed and optimize for collision avoidance.

In the first case you will want to use CRC32. CRC32 is very common but, depending on the number of files you have, might not be enough: you're guaranteed to have a collision at ~4,3 billion messages (32 effective bits), but in practice you might encounter your first collision at ~10 million messages. CRC32 has very fast implementations (SSE 4.2 even has a hardware instruction for it). CRC64 has a lot lower chance of a collision but is not widely used, hence if you want more collision avoidance than CRC32 you better look at cryptographic hash functions.

If you want to avoid collisions while sacrificing speed you will want cryptographic hash functions, of which MD5 (128 bits), SHA-1 (160 bits) and SHA-2 (usually SHA-256 or SHA-512) are the most widely used and have fast implementations. Very efficient hash collision finding algorithms for MD5 are available, but if you input random messages you'll get as close to the P(collision) = c/2^128 as you're ever going to get while still running in reasonable time.