Does Rainbow Table Not Require Decompression?

Your basic premise is wrong: a rainbow table is not just a compressed list of every possible hash lookup, and you do still need to do some hashing on the fly. Instead, it's a way of exploiting the nature of hashes to avoid storing the lookups in the first place, and minimise the amount you need to re-compute.

Wikipedia has quite a detailed explanation and there is an existing question here with good answers, but the basic idea is you create a table like this:

  1. Start with a particular password guess
  2. Hash it
  3. Take the hashed value as the next password guess (after applying some reproducible transform)
  4. Hash that
  5. Repeat steps 3 and 4 a set number of times
  6. Store only the original guess, and the final hash. It is this that makes the rainbow table smaller than a full lookup table.

From the values you've stored, you can get back all the guesses generated at step 3. In a sense, each pair of { first guess, last hash } "compresses" the full chain of generated guesses.

But the trick is you don't need to try all the chains to reverse a hash, because if you take the hash you're attacking and start at step 2, you will eventually end up at one of the final hashes you stored (at step 6). Once you've found that, you can recreate ("decompress", if you like) just that one chain, by going from step 1 (the stored password guess) and generating all the intermediate guesses.

An important difference between this and compression is that you can make the stored table as small as you want, by making the chains longer - you'll just have to spend longer re-generating hashes to choose a chain, and to re-create the chosen chain. You could have a million chains of length ten, or ten of length one million, trading storage against CPU time.

It is of course possible to compress the resulting data using any algorithm you like. Some of these will require decompressing the entire table before searching it, others might arrange the data so that it is still searchable but takes up less space. But it would also be possible to store the entire rainbow table as, say, a sorted list on a fast SSD, and you would still have saved space over a full hash table because you are only storing the start and end of each chain, not every possible hash.


No. The compression doesn't work like traditional RLE or LZMA style compression.

Rainbow tables are, essentially, a lookup table which allows you to find a string given its hash. They're designed to be incredibly efficient at finding a hash in the index across billions of entries, while minimising disk space.

Now, imagine you're building a table for lots and lots of strings. The hashes of some of these strings start with the same bytes - for example, "StackExchange", "ILikeWaffles9", "ILikeWaffles13507", and "Decompression242" when hashed with MD5 all start with 0xF2. Instead of storing all three hashes fully, you can construct a tree-like structure such that data looks like this:

  • f2
    • 173dcd3c1a83febadc8ed1759c3ffc = "ILikeWaffles13507"
    • 17f4a64e4036025c07b24a96ec787a = "Decompression242"
    • 50514201b94be52c1ea16cd688384e = "ILikeWaffles9"
    • 5cb1c6953bb0c62c639f3d7a242ec4 = "StackExchange"

Note that the hashes are sorted by numeric order.

In fact, since the first two strings also share the same second byte (0x17) these can also be chained:

  • f2
    • 17
      • 3dcd3c1a83febadc8ed1759c3ffc = "ILikeWaffles13507"
      • f4a64e4036025c07b24a96ec787a = "Decompression242"
    • 50514201b94be52c1ea16cd688384e = "ILikeWaffles9"
    • 5cb1c6953bb0c62c639f3d7a242ec4 = "StackExchange"

This also allows you to perform a lookup incredibly quickly - instead of having to search the full table, you only have to traverse the tree, and then search through a smaller list of hashes. Since the hashes are sorted, you can perform a binary search, which also has very good performance.

As an example, if I have the hash f217f4a64e4036025c07b24a96ec787a, I look for the first tree node f2, then look to see if there's a sub-node for the second byte, 17. There is, so I continue down. I check to see if there's a sub-node for f4. There isn't, so I now search through the list within the f2 -> 17 node. I know that f4 is likely to be near the end of that list, so I start there. I find that the hash matches the one I'm searching for, so I now know that the plaintext is "Decompression242".

It is also incredibly space-efficient when you've got millions or billions of hashes, because you don't duplicate parts of the hash that are shared with other plaintexts.


EDIT: Sorry, I should have pointed out that this is not literally how rainbow tables work. This is just an example of how compression can work in this regard, without needing to actually save a full hash for each plaintext. I didn't mean to imply otherwise. IMSoP's answer better describes the actual workings.


The key thing to remember is that rainbow tables are only useful when you want to do multiple hash searches for that hash type. You generate a rainbow table for a particular string list or character list ahead of time, only once, and then you can use that generated data set as many times as you like. It's a trade-off of doing a lot of work ahead of time, so that your later searches are very fast.

Another key thing is that any hash system which includes a salt automatically renders rainbow tables useless, since each password and salt combination should (ideally) be unique and long enough to make it impractical to build a rainbow table for every possible password and hash combination.