Is the BBC’s advice on choosing a password sensible?

My question isn't about the mathematical strength of passwords which obviously will depend on the lyric that is chosen and how one goes about passwordifying it, it is more about the the predictability of the total amount of possible passwords that are likely to pop up using this method.

This is a good question, and I'm going to depart from the norm here, put on my tinfoil hat, and say "no, this is not a good idea." Why? Let's look at it in the context of the Snowden leaks.

Because the GCHQ spies on all traffic on the British internet, and according to the Snowden leaks, your internet traffic is shared with the five eyes. Even if you're using HTTPS, this is a bad idea.

"But Mark Buffalo, you're being a maniac tinfoil hattist again!" Think about it. The time to crack your password was suddenly and significantly reduced. How?

  1. GCHQ takes history of your online searches. They likely know when you signed up for a certain website thanks to XKeyscore.
  2. If they know when you signed up for that website, they'll see you went to Google.com around that time and did a search for song lyrics. Even if you're using HTTPS, the fact that you connected to google.com around that time, and then visited a website that hosts song lyrics, is all they need to begin breaking your password.

    • Even if they can't view the traffic, they can still see that you connected. Even if you're using HTTPS, this doesn't stop them from hosting lyric websites themselves. This also doesn't stop companies from logging your search results, and it doesn't stop the companies from providing these results to anyone. If they know what kind of songs you like, or don't like, it makes it even easier.
  3. Now they can write an algorithm to crack your passwords much, much easier than brute-forcing every possible combination. Or even better yet, use a ready-made password cracker with a provided dictionary of those results.


But Mark Buffalo, the government isn't monitoring me!

That's all fine and dandy. You generally don't need to worry about them unless you're a criminal. Or you're privacy-conscious. Or you're a security researcher.

There's another important aspect you need to consider, which I think is far worse than the government: advertisement companies, and hackers "But Mark Buffalo, I use NoScript (great) and Ghostery (Ghostery sells your info)!" Most people don't use those. And many people who do, also don't use those tools when they use their smartphone.

There are data trails everywhere, especially if you own a smartphone (android in particular), and there are plenty of evil marketing companies that will sell your data down the river the first chance they get. Or maybe they aren't evil companies, but they get breached by hackers.

Anyone with a "need" could buy that data, and those sophiscated enough could steal it. While this seems like frantic worrying for such a small thing for most people, it gets much worse when you delve into the realm of federal contracting. This is one of the ways security breaches start.

All of the steps listed previously could be done without XKeyscore. They can be done very easily with vast marketing databases.


Stop the tinfoil, Mark.

If I were wearing my tinfoil hat right now, I'd believe this article was made as part of a plan to intentionally weaken standards. I personally believe that weakening standards is a national security risk, especially when federal contractors adopt those weakened standards.

Personally, I would worry more about evil marketing companies and hackers than I would the government. Especially when deliberately-weakened standards are what help potentially-hostile countries gain unauthorized access to critical infrastructure and intellectual property.


But seriously, this makes your password weaker

Now let's talk about numbers, and social engineering.

With a normal brute force of this password, you'd likely need the following characters based on this password policy:

  abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!@#$%^&*()-_+=

That's 76 possible characters. With this password method, assuming most people will use 6-7 words to generate the password, and perhaps add 1 symbol - !@#$%^&*() being the most common - plus a number, you'll need to test - for an 8-character password - 1,127,875,251,287,708 combinations to exhaust the password space. This could take an impossibly long time depending on the hashing algorithm and hardware.

Let's use md5 as an example (it's terrible, but it's computationally cheap. Please don't use md5; I am only using it as an example). To exhaust the character space of an 8 character password, it would take 4 years to crack with a cheap workstation. About 4 years 25 days 7 hours 46 minutes 54 seconds. If you were to up the password length to 9, it would take over 309 years. Keep in mind that processing power is growing rapidly.

Learning extra parameters about the user's password allows you to simplify this. Let's assume that you choose the following song: baby hit me one more time. This is your favorite song, and I know this because I socially-engineered you into telling me. Let's choose a predictable lyric phrase to create a password with: Hit me baby one more time. This becomes HmBomT. Now let's add some leet with a number. Now we have H@BomT3. Now that we know your favorite song, and your favorite phrase, this is what your password alphabet space becomes:

hHmMbBoOmMtT1234567890!@#$%^&*()-_+=

As you can see, this alphabet space is significantly reduced. It's much, much faster if you know what character the password starts with, but let's assume you don't. Let's further assume it's been randomized. Now you've reduced the time needed to exhaust the password space to 2,901,713,047,668 combinations, it takes 3 days to crack the password with a cheap workstation. Let's upgrade it to 9 characters. Now it takes 137 days 15 hours 47 minutes.

You can calculate this yourself (charset: custom). Also, all of this assumes you don't have a dedicated GPU cluster.

EDIT:

It's come to my attention that there is now evidence of custom hardware solutions dedicated to cracking bcrypt, one of which is a lot less expensive than a 25-GPU array, uses less power, and is vastly superior in every regard. Please read this amazing article if you want to learn more.


But shouldn't we simply increase password length?

Yeah, you could. Truthfully, it greatly increases entropy when you increase the password length.

However, then it becomes annoying to enter - especially for corporate environments that require you to log out every time you leave the computer. On top of that, it's very hard to remember this password.

You might eventually forget it after entering different passwords and being forced to change every few months. Even worse, you could forget it immediately, and be forced to visit the IT help desk to reset your password. This results in costs to the business, and lost productivity.

In fact, a better method would be a xkcd's correct horse battery staple. You could use an upper case somewhere, and a number somewhere else, or you could make it even easier while increasing entropy: something like correct horse battery staple gasoline. It's very easy to remember, very easy to type, and it's very hard for computers to break. Also remember that this should be randomly-generated from a 2048 word list.

For websites, I would recommend a password manager such as KeePass. I would not use LastPass, as it's vulnerable to phishing attacks. Websites can know you have LastPass enabled, because your browser is sending this information to the website if requested! This is part of how browser-fingerprinting works.

For corporate and other logins which you aren't able to use a password manager with, I would recommend a variant of correct horse battery staple with an extra word. Maybe correct horse battery staple gasoline? Much easier to remember.


It's horrible :) To provide some numbers to back claims by other answers:

This provides some numbers of how many songs are popular per year. For the last decade it was as low as 300-400 Top40 hits per year! Average word count for a song is 300-600, depending on the style, and they do 7-10 words per sentence (And I imagine that's the comfortable length of a password nowadays).

All this tallied up - the corpus of password bases for people who listen to popular music will be about 40,000 per year, not including repetitions (And we all know popular songs don't have a single repeating line!).

As such, just picking a random common, everyday word and adding your favorite digit to the end is just as secure a password base (assuming your favorite song is less than a year old - very true for so many people!). Which, if you ask any IT or security personel, is not secure at all. In fact, it's strictly worse than XKCD's famous tr0ub4dor&3, due to a smaller corpus and smaller average word length, and that was discussed en-masse.

To add insult to injury - none of the steps in the advice is any good really.

  • Most people tend to listen to the same music. Just go to a concert of a boy-band and look at the sheer size of the crowd (vs the number of words the singer is going to mutter).

  • I sincerely doubt most people will take the fiddly middle of a verse. I find it much more plausible that it's the catchy chorus that will be chosen (After all, you have to remember the line word-for-word, not just the general meaning or tempo!)

  • Taking the first letter of words is horrible. 7 letters cover 65% of the language that way. Of course this didn't analyze lyrics specifically, but I doubt it's better there1.

  • Case-sensitive is OK, but only if you make the uppercase positions truly random. Which you won't. It's too easy if the first one is capital. And you don't lump them together, nonono. And there has to be a decent number, but not too much, right?

  • l33t-ing a password is mostly meaningless. Of the most-common letters, only a couple can be replaced, and the replacement is known beforehand. And you won't properly randomize which characters get replaced and which - not.


1 Trying to make an argument, I actually calculated the Shannon entropy from the article above. Turned out it's ~4.075, vs 4.7 for a random letter distribution. This is not as bad as I expected, although it does mean that a 10-character password is 70 times easier to guess if it's made by first-lettering, rather than having random letters


It is more secure than what most people are doing, which is to use one dictionary words. The BBC's method starts with one or two sentence, instead of just a word. However, it is less secure than what it could have been.

First, if you're using a well known chorus, you're increasing the chance of other people having similar passwords to you.

Second, personally I think it's easier to type whole words than disconnected letters, even if it increases the number of keys that you need to press. Using just the first letters from a sentence throws away entropy.

My advice if you need a password that's intended to be remembered (i.e. can't be saved in a password manager) is to randomly generate a phrase using something like diceware.

Another alternative is to start by generating a random n-letters password. But then try to find a mnemonic for it. This difference in the order is crucial; if you start with the mnemonic and then passwordify the mnenomic, you're likely to be less random than if you generated the password first.