What are rainbow tables and how are they used?

Rainbow Tables are commonly confused with another, simpler technique that leverages a compute time-storage tradeoff in password recover: hash tables.

Hash tables are constructed by hashing each word in a password dictionary. The password-hash pairs are stored in a table, sorted by hash value. To use a hash table, simple take the hash and perform a binary search in the table to find the original password, if it's present.

Rainbow Tables are more complex. Constructing a rainbow table requires two things: a hashing function and a reduction function. The hashing function for a given set of Rainbow Tables must match the hashed password you want to recover. The reduction function must transform a hash into something usable as a password. A simple reduction function is to Base64 encode the hash, then truncate it to a certain number of characters.

Rainbow tables are constructed of "chains" of a certain length: 100,000 for example. To construct the chain, pick a random seed value. Then apply the hashing and reduction functions to this seed, and its output, and continue iterating 100,000 times. Only the seed and final value are stored. Repeat this process to create as many chains as desired.

To recover a password using Rainbow Tables, the password hash undergoes the above process for the same length: in this case 100,000 but each link in the chain is retained. Each link in the chain is compared with the final value of each chain. If there is a match, the chain can be reconstructed, keeping both the output of each hashing function and the output of each reduction function. That reconstructed chain will contain the hash of the password in question as well as the password that produced it.

The strengths of a hash table are that recovering a password is lightning fast (binary search) and the person building the hash table can choose what goes into it, such as the top 10,000 passwords. The weakness compared to Rainbow Tables is that hash tables must store every single hash-password pair.

Rainbow Tables have the benefit the person constructing those tables can choose how much storage is required by selecting the number of links in each chain. The more links between the seed and the final value, the more passwords are captured. One weakness is that the person building the chains doesn't choose the passwords they capture so Rainbow Tables can't be optimized for common passwords. Also, password recovery involves computing long chains of hashes, making recovery an expensive operation. The longer the chains, the more passwords are captured in them, but more time is required to find a password inside.

Hash tables are good for common passwords, Rainbow Tables are good for tough passwords. The best approach would be to recover as many passwords as possible using hash tables and/or conventional cracking with a dictionary of the top N passwords. For those that remain, use Rainbow Tables.


There are many good explanations of what rainbow tables are, this one How Rainbow Tables work is particularly good. Also the Wikipedia article has a very good explanation as well. For a bit more indepth reading the definitive paper on the subject is Making a Faster Cryptanalytic Time-Memory Trade-Off.

A simple explanation of Rainbow Tables is that they make use of a time memory trade off technique. Meaning instead of taking a target hash value and a dictionary of words then hashing each word and doing the comparison on the fly (brute force approach using something like John), you instead hash all the values in the dictionary in advance (this may take a very long time depending on dictionary size). But once its done you can compare as many hashes as you want against the pre hashed values in the rainbow tables this is significantly faster than calculating the hashes again.

The explanation I wrote here previously in an effort to be short was misleading, since it did not explain the use of reductions that rainbow tables make use of. For a better explanation till I rewrite this bit see @Crunge answer.

You can either generate the rainbow tables yourself using an application like RainbowCrack or you can download them from sources like The Shmoo Group, Free Rainbow Tables project website, Ophcrack project and many other places depending on what type of hashes you need tables for.

To protect against a Rainbow Table based attack the most effective method of combating it is to ensure that every hash within a system is salted. This makes pre-generated rainbow tables useless and would mean an attacker would have to generate a custom set of tables to use against the targeted hashes, which would only be possible if they knew the salt.


Purpose and relevance

Rainbow tables help crack difficult passwords, i.e. those that can not even be found in a large dictionary. Passwords were historically stored as plain hashes in databases, and that's what rainbow tables are effective against: create a single rainbow table (slow) and run any number of databases full of hashes against it (fast).

These days, more and more systems use proper password storage algorithms such as Bcrypt, Scrypt or Argon2. See: How to securely [store] passwords? Those algorithms are no longer "vulnerable" to rainbow tables: since each hash is unique, even if the passwords are equal, rainbow tables no longer work.

That's why rainbow tables are unpopular today. Even if something modern like Argon2 is not used, developers nowadays usually know that they should at least use a salt. That is already enough to make a rainbow table useless.

How they work

Creating a table

Imagine we create a rainbow table with just two chains, each of length 5. The rainbow table is for the fictional hash function MD48, which outputs 48 bits (only 12 hexadecimal characters). When building the chain, we see this:

Chain 0: 0=cfcd208495d5 => z=fbade9e36a3f => renjaj820=7668b2810262 => aL=8289e8a805d7 => ieioB=2958b80e4a3a => WLgOSj
Chain 1: 1=c4ca4238a0b9 => ykI4oLkj=140eda4296ac => Dtp=1b59a00b7dbe => W=61e9c06ea9a8 => 6cBuqaha=d4d2e5280034 => 0uUoAD

We start with 0 because it's the first chain (we just need some value to start with). When we hash this with MD48, it turns into cfcd208495d5. Now we apply a "reduce" function which basically formats this hash back into a password, and we end up with "z". When we hash that again, we get fbade9e36a3f, then reduce it again and get renjaj820. There are some more cycles, and the final result is WLgOSj.

Same for the second chain. We just start with another value and do the same thing. This ends in 0uUoAD.

Our complete rainbow table is now this:

WLgOSj => 0
0uUoAD => 1

That's all you have to store.

Looking up a hash

Let's say we found a hash online, 7668b2810262. Let's crack it using our table!

Looking for hash '7668b2810262', reduced to 'aL'.
hashed=>reduced 'aL' to ieioB
hashed=>reduced 'ieioB' to WLgOSj
Found a match, 'WLgOSj' is in our rainbow table:
    WLgOSj => 0
The chain starts with '0'. Let's walk that chain and look for the hash.
hashed '0' to cfcd208495d5
hashed 'z' to fbade9e36a3f
hashed 'renjaj820' to 7668b2810262
That hash matches! Found the password: renjaj820

To play around with it yourself, the above examples were created using this Python script: https://gist.github.com/lgommans/83cbb74a077742be3b31d33658f65adb

Scaling properties

In short:

  • Fast lookups means bigger tables, assuming coverage stays the same.
  • Better coverage means either slower lookups, or bigger tables.
  • Smaller tables means either slower lookups, or worse coverage.

The following sections assume the time per hash+reduction is 1µs, and fails to account for collisions. These are all ballpark numbers, meant as examples and not exact values.

Lookup time

If a hash+reduction operation takes a microsecond, then generating a table with a million chains and 10 000 reductions per chain would take about 3 hours:
chain_length × chain_count / reductions_per_second / seconds_per_hour
= 10 000 × 1 000 000 / 1 000 000 / 3600 = 2.8 hours.

Doing a lookup in that table takes on average 10 milliseconds. This is because we will typically have to go through chain_length/2 reductions before we find which chain contains the hash. For example, we might have to do 3000 reductions on a hash before we find a value that is in the table. Next, we have to re-do that chain from the beginning until we find the matching value. If we just had to do 3000 to find it in our table, then we must do 7000 reductions from the beginning to get to the right point in the chain. Basically, we do as many operations when looking up as when generating a single chain. Therefore, the lookup time is 10 000 times a microsecond, which is ten milliseconds (or a centisecond, if you will).

Storage requirements

When you want to make a full, fast lookup table for a hash function, even MD5, you'd still need a hundred billion billion terabytes of storage. That's not very helpful. But what if we want to cover only lowercase passwords until 10 characters?

If we want to spend at most 30 seconds looking for a hash, and assuming we need 1 microsecond (a millionth of a second) per hash+reduce cycle, then we can have a chain length of: 1 million × 30 = 30 million. There are 2610 (or 1014) possible lowercase passwords of 10 characters, and per chain we cover 30 million values. That leaves us with 4 million chains. We know that each chain has only a start and end value stored, and that the values are 10 characters each. So 2 × 10 × 4 million = 76 MiB data.

Generating the table by iterating through all 10-character passwords takes a long time: 30 seconds per chain, times 4 million chains is about 91 years. A lot of people would be interested in such a table, though, so by pooling 1092 CPUs (=91×12), it takes only 1 month. This shows how small such a table can be compared to the password space it covers: lookups take only 30 seconds and you have to store only 76MiB data.

Conclusion

Rainbow tables can be considered a time-memory trade-off: one stores only a small part of the table and recovers it through some extra computation on lookup time. This is part of the reason why salts, or rather, a good password storage algorithm like Scrypt or Argon2 are important to keep passwords safe. A rainbow table can only recover a salted password if the table contains an entry big enough to contain both the salt and the password, which would be extremely inefficient and defeats the whole purpose.

Note that a similar thing applies with encryption: when people encrypt files with a password, a rainbow table can be built to crack the files. Let's say the software uses AES, and the first block of the file should decrypt to "passwordcorrect" using the user's supplied password, then a rainbow table would use AES instead of a hash function.

Whenever you handle a password (a secret that is of unknown strength, and especially if the user might re-use it), always run it through a proper (slow) password storage algorithm to make it slow and unique to crack.