Hashing email addresses for GDPR compliance

MD5 or SHA is not the concern. Hashes can be used for pseudonymization. The problem is that the hash would need to be salted (or peppered) so that data from other sources could not be used to identify the person.

My email is the same everywhere. A hash of it would also be the same. So that means that, in this case, the hash and my email become synonymous. Just like a username and the legal name of a person if paired. If you use a hash in this case, you actually gain nothing in terms of GDPR.

Hashing with a salt (or pepper) makes de-anonymising nearly impossible without knowing the added value. The salt (or pepper) almost becomes the token, in this case.

As always, check with your DPO.


Realistically, pseudonymization is any method of obfuscating someone's PII/NPI so that it can't be reasonably traced back to one certain individual. GDPR doesn't necessarily dictate what hashing algorithm you are required to use in order to comply with it's standard, and to be honest - it's best that it doesn't, because if you consider the fact that if everyone was using the exact same method of obfuscation, you're creating a massive single point of failure all around. Your best bet, (as mentioned above) is to use some form of tokenization with salt, to add extra randomness to your algorithm so that it can't be easily bruteforced.