Is Diceware more secure than a long passphrase?

Most people that use passphrases, use passphrases wrong.

The remark that Diceware is better probably comes from the fact that, when people use passphrases, they usually take a well-known or otherwise logically structured sentence and use that. "Mary had a little lamb" is a terrible passphrase because it is one of a few billion well-known phrases that a computer can run through in a short amount of time. I know this works pretty well because I tried it.

Diceware is just random words. It's as good as any other randomly generated set of words, assuming you use a good source of randomness: for Diceware, you should use dice, which is a reasonably good source. Digital password generators are usually also good, though homebrew implementations might use an insecure random generator by mistake.

We know that any random passphrase is good because it's basic math. There are two properties to a passphrase:

  • Dictionary size
  • Number of words in the phrase

The 'randomness' of a passphrase is simple to calculate: dictionary_size ^ words_in_phrase, where ^ is exponentiation. A passphrase of 3 words with a dictionary of 8000 words is 8000^3= 512 billion possible phrases. So an attacker, when guessing the phrase, would have to try 256 billion phrases (on average) before s/he gets it right. To compare with a password of similar strength: a random password using 7 characters, consisting of a-z and A-Z, has a "dictionary size" of 52 (26 + 26) and a "number of words" of 7, making 52^7= ~1028 billion possible passwords. It is well-known that 7 characters is pretty insecure, even when randomly generated.

For randomness, it's the more the better up until about 128 bits of entropy. A little more than that helps buffer against cryptographic weakenings of algorithms, but really, you don't want to memorize 128 bits of entropy anyway. Let's say we want to go for 80 bits of entropy, which is a good compromise for almost anything.

To convert "number of possible values" to "bits of entropy", we need to use this formula: log(n)/log(2), where n is the number of possible values. So if you have 26 possible values (1 letter), that would be log(26)/log(2)= ~4.7 bits of entropy. That makes sense because you need 5 bits to store a letter: the number 26 is 11010 in binary.

A dictionary of 8000 words needs about 7 words to get above the desired 80 bits:
log(8000^7)/log(2)= ~90.8 bits of entropy. Six words would be:
log(8000^6)/log(2)= ~77.8 bits of entropy.

A large dictionary helps a lot, compared to the relatively small Diceware dictionary of 7776 words. The Oxford English Dictionary has 600k words. With that many words, a phrase of four randomly chosen words is almost enough:
log(600 000^4)/log(2)= ~76.8 bits of entropy.

But at 600 thousand words, that includes very obscure and long words. A dictionary with words that you can reasonably remember might have a hundred thousand or so. Instead of the seven words that we need with Diceware, we need five words in our phrase when selecting randomly from a dictionary of 100k words:
log(100 000^5)/log(2)= ~83.0 bits of entropy.

Adding one more word to your phrase helps more than adding ten thousand words to your dictionary, so length beats complexity, but a good solution balances the two. Diceware seems a little small to me, but perhaps they tested with different sizes and found this to be a good balance. I am not a linguist :).

Just for comparison, a password (consisting of a-z, A-Z, and 0-9) needs 14 characters to reach the same strength: log(62^14)/log(2)= ~83.4 bits of entropy.


Passwords should be easy to remember and hard to guess. As AviD once said, security at the expense of usability, comes at the expense of security. A passphrase is easy to remember because it has some sort of meaning to the user, even though it might seem random at first. Taking a look at usability, a passphrase is more superior: You don't need dice and a list of words, you can think of a passphrase yourself and remember it more easily.

However, using dice and a random list of words makes for a near fully random password. There is no link to the user, where a passphrase most of the times (unless truly random) was made up of something related to the user.

Any password checker online can only verify how hard it would be for a computer to guess a password, where a sentence (or passphrase in this case) might be more easily guessed by another human. In your example, the length of your diceware generated password is less then the passphrase (however still very long compared to security standards nowadays), but as you stated yourself, you can create longer passwords when you want to.

I wouldn't say diceware is always superior, but it definitely is more random and can still have the same length as a passphrase which makes it superior in certain cases.


That statement you cite that Diceware is "better" than passwords doesn't have a theoretical justification attached, which makes it tricky to assess. But I can come up with one such justification: Diceware comes with a procedure for generating passphrases at random with dice, and this guarantees that the outputs generated have at least some minimum amount of entropy (difficulty of guessing). Since log2(6) is about 2.6, Diceware gives you at least 2.6 bits of entropy per dice roll.

One the other hand, there is no obvious way of estimating how difficult a long natural language passphrase like "Blue Light shines from the small Bunny onto the Lake" would really be for a password cracker. People usually assume that because it's long that automatically makes it strong, but that's not true. This Ars Technica article about cracking very long passphrases is very instructive in that regard:

[Kevin Young] joined forces with fellow security researcher Josh Dustin, and the cracking duo quickly settled on trying longer strings of words found online. They started small. They took a single article from USA Today, isolated select phrases, and inputted them into their password crackers. Within a few weeks, they expanded their sources to include the entire contents of Wikipedia and the first 15,000 works of Project Gutenberg, which bills itself as the largest single collection of free electronic books. Almost immediately, hashes from Stratfor and other leaks that remained uncracked for months fell. One such password was "crotalus atrox." That's the scientific name for the western diamondback rattlesnake, and it ended up in their word list courtesy of this Wikipedia article. The success was something of an epiphany for Young and Dustin.

"Rather than try a brute force that makes sense to a computer but not to people, let's use human beings because people typically make these long passwords based on things that humans use," Dustin remembered thinking. "I basically utilized the person who wrote the article on Wikipedia to put words together for us."

Almost immediately, a flood of once-stubborn passwords revealed themselves. They included: "Am i ever gonna see your face again?" (36 characters), "in the beginning was the word" (29 characters), "from genesis to revelations" (26), "I cant remember anything" (24), "thereisnofatebutwhatwemake" (26), "givemelibertyorgivemedeath" (26), and "eastofthesunwestofthemoon" (25).

If you just pick long passphrases innocently without any sound theory of why your procedure gives strong passphrases, they might be vulnerable to some attack you just haven't thought of. Whereas Diceware is invulnerable to anything but brute force, because cracking Diceware is at least as hard as guessing 25+ dice rolls.


I used zxcvbn to compare the strength of the two example passwords below and it seemed as if the passphrase was more secure than the Diceware password.

Here I should repeat a point I made more at length in this answer to another question:

  • A password strength meter can conclusively prove that a passphrase is weak;
  • But no such meter can ever prove that a passphrase is strong, because the passphrase might be vulnerable to some attack the meter does not model.

For example, zxcvbn—which is an excellent tool overall, but just isn't designed for the use you're making of it—estimates centuries for this passphrase:

password:   Am i ever gonna see your face again?
guesses_log10:  31.35342
score:  4 / 4
function runtime (ms):  5
guess times:
100 / hour:   centuries (throttled online attack)
10  / second: centuries (unthrottled online attack)
10k / second: centuries (offline attack, slow hash, many cores)
10B / second: centuries (offline attack, fast hash, many cores)

But this is one that I took from the Ars Technica article quote above, so we know it has been cracked in real life. We have independent proof that the zxcvbn estimate is wrong.

zxcvbn's analysis gives cleft cam synod lacy yr wok a guesses_log10 value of 26.22025, which is technically weaker than it estimates for Am i ever gonna see your face again?. But if it's a 5-word Diceware passphrase that we generated by making 25 dice throws, we have independent proof that it has at least log2(6) × 25 = 64.5 bits of entropy (whose corresponding guesses_log10 value would be more like 19.4, so zxcvbn is arguably overestimating how strong it is).

For your passphrase Blue Light shines from the small Bunny onto the Lake., we just don't have any independent argument for why it's strong other than your hunch, which is undermined by the fact that you've posted it to Stack Exchange (and thus could now be used as input for an attack like what the Ars article explains). Maybe it is strong, but the philosophy that a system like Diceware embodies is that you shouldn't base your password strength on hunches, but rather, on actual random procedures that give you minimum entropy guarantees.