Can a deterministic hashing function be easily decrypted?

General answer

A cryptographic hash function cannot be easily reversed. This is why it is also sometimes called a one-way function. There is no going back.

You should also be careful about calling this 'decryption'. Hashing is not the same as encryption. The set of possible hash value is typically smaller than set of possible inputs so multiple inputs map to the same output.

For any hash function given the output you can't know which of the many inputs was used to generate this particular output.

For cryptographic hashes like SHA1 it is very difficult to even find one input that produces that output.

The simplest way to reverse a cryptographic hash is to guess the input and hash it to see if it gives the right output. If you are wrong, guess again. Another approach is to use rainbow tables.

Regarding using hashing to encrypt SSNs

With your use case of SSNs an attack is feasible due to the relatively small number of possible input values. If you are worried about people getting access to SSNs then it might be best to not store or use the SSN at all in your application, and in particular do not use them as an identifier. Instead you could find or create another identifier, for example an email address, a login name, a GUID or just an incrementing number. It can be tempting to use the SSN as it is already there and at first glance appears to be a unique unchanging identifier, but in practice using it just causes problems. If you absolutely need to store it for some reason then use strong non-deterministic encryption with a secret key and make sure you keep that key safe.

The whole point of a cryptographic hash is that you can't decrypt it and that it does encrypt the same way every time.

A very common use case for cryptographic hashes is password validation. Imagine I have the password "mypass123", and the hash is "aef8976ea17371bbcd". Then a program or website wishing to validate my password can store the hash "aef8976ea17371bbcd" in their database, instead of the password, and every time I want to log in, the site or program re-hashes my password and makes sure that the hashes match. This allows the site or program to avoid storing my actual password, and so protects my password (in case it's a password I use elsewhere) in the case that the data is stolen or otherwise compromised -- a hacker would not be able to go backwards from the hash to the password.

Another common use of cryptographic hashes is integrity checking. Suppose a given file (e.g. an image of a Linux distribution CD) has a known, publicly available cryptographic hash. If you have a file which purports to be the same thing, you can hash it yourself and see if the hashes match. Here, the fact that it hashes the same way every time allows you to independently validate it, and the fact that it is cryptographically secure means that no one can feasibly create a different, fake file (e.g. with a trojan in it) that has the same hash.

Keep in mind the very important distinction between hashing and encryption, though: hashing loses information. This is why you can't go backwards (decrypt) the hash. You can hash a 20 GiB file and end up with a 40-some character hash. Obviously, this has lost a lot of information in the process. How could you possibly "decrypt" 40-some characters into 20GiB? There's no such thing as compression that works that well! But this is also an advantage, because in order to check the integrity of a 20 GiB file, you only have to distribute a 40-some character hash.

Because information is lost, many files will have the same hash, but the key feature of a cryptographic hash (which is what you're talking about) is that despite the fact that information is lost, it is computationally infeasible to start with a file and construct a second, slightly different file that has the same hash. Any other file with the same hash would be radically different, and not easily mistakable for the original file.

I see your point based on the fact that you are trying to hide Social security numbers. If someone knows you are using an SHA1HASH on the SSN to create a unique identifier, then can just generate a quick list of all SSN numbers, SHA1HASH them, then compare to automatically have the SSN of the person in the record. Even worse, they can pregenerate all these in a hash lookup table, and have a key of 1 hash for every SSN. This is called a hash lookup table, and more complex forms are called rainbow tables.

This is why a second feature of hashing was invented. It is called salting. Salting is basically this; you create a salt, then modify your data using the salt. For instance, say you had the SSN 123-45-6789 . You could salt it with the string "MOONBEAM". Your new string for hashing is "123-45-6789MOONBEAM"

Now, even if someone knows that you are hashing the SSN to generate your unique ID, they still don't know the salt you will be using, and so are unable to derive the original SSN by pre-hashing a list of all SSNs and comparing to your ID. You however, can always take the user's SSN, use the salt, and rehash the SSN+SALT to see if the user SSN matches up with their ID.

Finally, if you use just 1 salt for everything, and keep it secret, instead of being able to see the salt, and generate the corresponding SSN by running SSN increments + salt 100 million times and picking the match, they have to do a lot more work to retrieve SSN. This is because the 100 million SSN numbers have a relatively low amount of entropy. (10^9 combinations). By adding your salt and keeping it secret, instead of just running

SHA1HASH(111-11-1111) -> check hash match
SHA1HASH(111-11-1112) -> check hash match
SHA1HASH(111-11-1113) -> check hash match

They would have to run

SHA1HASH(111-11-1111a) -> check hash match
SHA1HASH(111-11-1111b) -> check hash match
SHA1HASH(111-11-1111c) -> check hash match
...
SHA1HASH(111-11-1111azdfg) -> check hash match
SHA1HASH(111-11-1111azdfh) -> check hash match
....
SHA1HASH(111-11-1111zzzzzzzzzzzzzzzz) -> check hash match
SHA1HASH(111-11-1112a) -> check hash match
SHA1HASH(111-11-1112b) -> check hash match

.. and so on until they finally get to

SHA1HASH(123-45-6789MOONBEAM) -> check hash match

at which point they finally did manage to crack the SSN + SALT

They don't even know how many characters long your salt is So that is 10^(number of characters of your salt) times more work for them to do just to get 1 SSN, let alone get the whole table.

Update: Many years later, I see that my info on salting was incorrect when I answered this question. Please see the correct info in posts and comments below about using unique salts per entry, as this is still the first post in the chain. If you think I should change the OP after reading this, leave a comment below (or upvote one), and if the consensus is there, I will correct it.

Can a deterministic hashing function be easily decrypted?

Tags:

Algorithm

Security

Encryption

Hash

Related

Recent Posts