How does a 'rainbow table' hacker obtain password hashes in the first place?

The news is full of examples of leaked databases (this is just the most recent results).

The How:

The vast majority of cases involve unsecured databases/backups (across pretty much all technologies: S3, mongodb, cassandra, mysql, etc....). These are usually due to configuration errors, bad defaults, or carelessness.

What data is leaked:

These generally provide at least read-only access to some or all of the data contained in the database, including usernames and hashed-and-salted passwords.

These dumps include a lot of private user records. Plaintext passwords (or using a simple hash such as md5) are even more problematic because that data can be used in credential stuffing attacks (by trying the same username/password combinations on different websites), potentially accessing even more data.

What to do with a password hash:

If an attacker has access to a hashed and salted password, they cannot just provide this to the server to authenticate. At login time, the server computes hash(salt + plaintext_password) and compares it with the value stored in the database. If the attacker attempts to use the hash, the server will just compute hash(salt + incoming_hash), resulting in a wrong value.

One scenario that could spell a lot of trouble is client-side-only password hashing. If the client computes and sends hash(salt + plaintext_password) into the login endpoint, then the stored hash can be used to login. This alone shows how dangerous that is to do. There are some algorithms that offload some of the work to the client (such as SCRAM) but they involve a more thorough client-server exchange to prevent exactly this scenario.

Password storage security is worried about attackers deriving the real password from the stored value. It is not concerned with other vectors of attack against the server.


If the hacker already has the password hashes, can't he just use them to hack the system?

Unless you're talking about NTLM hashes on windows environments (under certain conditions), the attacker would need to crack them. Not all systems permit using encrypted hashes for authentication.


Performing cryptanalysis against a hashed password consists of generating sequences of characters, hashing them using the same method and comparing the results (you might need to use other pieces of information like usernames for using as salt in the calculation). It's that simple. And also very inefficient, by design(1).

  • You could employ a brute forcing method whereby you try all possible combinations of the character set you choose (e.g. alphanumeric, alpha+symbols, etc) up to whichever length you'd be willing to go. This guarantees you will find the password, given enough computational effort. It can take centuries to go through a large enough character set with a long enough length with a given hash method;
  • Or you could use a hybrid mode by selecting words out of a dictionary and generating a sequence of variations against those words as candidate passwords. This is hugely more efficient but there's no guarantee that the password will be found;

A rainbow table is a method whereby you pre-calculate tables of plaintext to hash (possibly with salting(2)). Given a hash you want to crack, you just lookup the plain text password. It's virtually instant. It's a trade-off where you spend your computational time ahead of the cracking moment to the expense of storage. The complication is that the rainbow tables will also take a long time to build(2) and will take a significant amount of storage space (GB to TBs, but there is no real limit).

(1) hashing algorithms are effective if they require a significant amount of computational power to calculate, meaning that one calculation (at login) is relatively cheap, but a large volume of calculations will take a long time to do, hence reducing the effectiveness of brute forcing;

(2) If salting is involved in the hashing algorithm (as it normally is and rightly so), rainbow tables based cryptanalysis loses efficiency since you'd need one table per salt element. Since often usernames are used as salt, you'd need to generate a table per username... if you know those in advance. There's still use to this such as keeping pre-calculated tables for "Administrator" accounts;


Well, the first thing is... what is a Rainbow Table?

A Rainbow Table is a list of the hashed values for the most common X# of passwords. 'Password', 'Password123', 'baseball', 'batman1', etc, etc - hash them all with the hash algorithm the target systems uses.

Then, check whether any column in the compromised SQL table matches any entry in the Rainbow table. Entry '73def92a987efa98b987da' matches for user 'bob bobson' - you look at your rainbow table and see that entry corresponds to 'letmein', so you cracked bob's password. Actually, you wouldn't have just cracked bob's password - you would've cracked everyone that had that as their password, because hash('letmein') would've been the same for them all.

That's the thing - Rainbow Tables aren't targeted at a specific account. They a way of getting the lowest-hanging fruit. You might only crack 20% of the passwords with your table... but that means you cracked 20% of the accounts!. Why try to hack a specific account when you can quickly compromise thousands of the weakest-secured ones?

So what does (proper) salting do? It applies a value that's different for each account. Bob's password has a salt of '123' prefixed to it; Alice's has a salt of '468' prefixed to it. So even if they used the same password, their hashed entry wouldn't be the same - and the rainbow table wouldn't help you out. Salting prevents the hacker from trying to hack everyone's account at the same time, and forces them to do things one record at a time.

(By the way, this is why you'll see security people screaming to Never Reuse A Salt. Because if, say, all the records use the same salt? Then the attacker can recompute the rainbox table with the fixed salt, and once again be able to attack everyones' accounts at the same time.)