Why is it called rainbow table?

Rainbow is a variant of dictionary attack (Pre-computed dictionary attack to be exact), but it takes less space than full dictionary (at the price of time needed to find a key in table). The other end of this space-memory tradeoff is full search (brute force attack = zero precomputation, a lot of time).

In the rainbow table the precomputed dictionary of pairs key-ciphertext is compressed in chains. Every step in chain is done using different commpression function. And the table has a lot of chains, so it looks like a rainbow.

In this picture different compression functions K1, K2, K3 have a colors like in rainbow: The table, stored in the file contains only first and last columns, as the middle columns can be recomputed.

enter image description here


Because it contains the entire "spectrum" of possibilities.

A dictionary attack is a bruteforce technique of just trying possibilities. Like this (python pseudo code)

mypassworddict = dict()

for password in mypassworddict:
    trypassword(password)

However, a rainbow table works differently, because it's for inverting hashes. A high level overview of a hash is that it has a number of bins:

bin1, bin2, bin3, bin4, bin5, ...

Which correspond to binary parts of the output string - that's how the string ends up the length it is. As the hash proceeds, it affects differing parts of the bins in different ways. So the first byte (or whatever input field is accepted) input affects (say, simplistically) bins 3 and 4. The next input affects 2 and 6. And so on.

A rainbow table is a computation of all the possibilities of a given bin, i.e. all the possible inverses of that bin, for every bin... that's why it ends up so large. If the first bin value is 0x1 then you need to have a lookup list of all the values of bin2 and all the values of bin3 working backwards through the hash, which eventually gives you a value.

Why isn't it called a dictionary attack? Because it isn't.

As I've seen your previous question, let me expand on the detail you're looking for there. A cryptographically secure hash needs to be safe ideally from smallish input sizes up to whole files. To precompute the values of a hash for an entire file would take forever. So a rainbow table is designed on a small well understood subset of outputs, for example the permutations of all the characters a-z over a field of say 10 characters.

This is why password advice for defeating dictionary attacks works here. The more subsets of the whole possible set of inputs you put into your input for the hash, the more a rainbow table needs to contain to search it. The data sizes required end up stupidly big and so does the time to search. So, think about it:

  • If you have an input that is [a-z] for 5-8 characters, that's not too bad a rainbow table.
  • If you increase the length to 42 characters, that's a massive rainbow table. Each input affects the hash and so the bins of said hash.
  • If you throw numbers in to your search requirement [a-z][0-9] you've got even more searching to do.
  • Likewise [A-Za-z0-9]. Finally, stick in [\w] i.e. any printable character you can think of, and again, you're looking at a massive table.

So, making passwords long and complicated makes rainbow tables start taking blue-ray sized discs of data. Then, as per your previous question, you start adding in salting and hash derived functions and you make a general solution to hash cracking hard(er).

The goal here is to stay ahead of the computational power available.