Randomly selected words converted into sentence. Did I lose passphrase strength or gain it?

Intro

Given that you're using a diceware list, I'll follow the one found here but this applies to any list (except for the average word size and some details in the instructions).

Properties of the diceware list

The instructions for using the diceware list make it clear that the overarching goal is to avoid bias when constructing a passphrase, starting from the use of dice and fixed list, down to the very specific instructions that should be followed to the letter. For example, it is specified:

If you do roll several dice at a time, read the dice from left to right.

So what do you gain if you follow these steps exactly:

  • a random passphrase with easy to evaluate entropy
  • no personal bias in the choice or order of words

Attacks

  • A blind brute force attack will need to crack a password of size approximately 4.2 (the average word length in the diceware list) times the number of words, plus spaces. Using even 4 words is making this infeasible.
  • An attack knowing that you used diceware will be in similar dire straits. Each word added to the passphrase adds another 12.9 bits of entropy so 5 words is 64.5 bits. This will already be fairly strong.
  • A dictionary or phrase attack will not do any better than the above attack since the set of words to try is exactly the diceware list and the random order does not make full phrases a good attack vector.

Modifying the passphrase

The simplest way to illustrate this is to use an (admittedly contrived) example.

Let's take the following six words obtained (in this order) by following the instructions: dog, quick, fox, lazy, jump, brown.

But my memory is not doing so well these days and I can quickly massage these words into something much more memorable:

The quick brown fox jumps over the lazy dog.

This is presuming that I'm feeling free to change the order and insert words like the, over, and punctuation.

Let's now look at our previous attacks:

  • The blind brute force attack now has to deal with a much longer password and will do considerably worse.
  • An attack knowing that you used diceware is now in hot water since either those words did not exist in the list, or the number of words massively increased as did the entropy.
  • A dictionary or phrase attack is where we hit trouble. This is a fairly well known sentence, which is the reason it popped into my head.

For this exaggerated example, the dictionary attack with phrases might find my passphrase in very little time. This shows why bias in password selection can be a dangerous thing.

Summary

It comes down to two things: which attacks are being used, and how common your new passphrase is. You only control the latter.

My take on this would be to stick to the strict anti-bias rules of the diceware list. They are there precisely to avoid this kind of situation.

That said, your new passphrase may not be that much worse than the original but it's really hard to tell due to the very subjective nature of personal bias.

Additional notes

The instructions warn about this specific case:

You should also start over if your passphrase is a recognizable sentence or phrase in the language you are using

They also allow for additional punctuation but also chosen randomly for both the type and the position. I've omitted mentioning them before since they can be considered part of the diceware method, though the same caveat applies for intentionally-placed punctuation.


The answer depends on the way your are going to manipulate your passphrase, because some transformations will lower the entropy, while others will not. For example, are you changing the order of the words? Are you turning some verbs into adverbs, or nouns into adjectives, etc. to produce a better sentence? Then the resulting entropy is likely to be lower. Or are you only adding additional particles between the words? Then the resulting entropy will be higher, or remain the same at worst. Let's see why.

Entropy is related to the number of different combinations an attacker will have to try, and for a generic password the total combinations will be N^M, where N is the size of the alphabet and M is the length of the password. But you can use that formula only if each symbol is chosen at random independently of the other symbols! In a 5-word diceware passphrase based on a list of 7776 words, N is 7776 and M is 5, and you can apply the formula because each word is independent of the others, since you have rolled the dice for every word.

But what happens when you modify the diceware passphrase? To make sure you are not reducing the entropy, you need to make sure that each word is still totally independent of the others. Here's my advice:

  • Don't change the order of the diceware words. If you change the word order, the words might not be completely independent of the others at the end of the transformation. For example, if you have the words "scientist" and "mad" available, you might be tempted to swap them and put them next to each other as "mad scientist", creating a common expression that in English I guess is called "collocation" (words that are often seen together). Then the words "mad scientist" will not be independent of each other anymore. Other examples like this where you are going to lose some entropy will likely be more subtle and difficult to consciously avoid.
  • Avoid collisions. What I mean by this is that you should not turn "happy" into "happily" if both words are part of the diceware word list. Or think of "help" and "helpful", or "wear" and "worn", for example. These would be "collisions", and as a result you might end up replacing a word from the list with another one in the same list in a non-random way. Some word lists might be very susceptible to collisions, while for other word lists the problem might be non-existent or negligible.

In conclusion, if you only add some particles to connect the words of your passphrase, without changing anything else (including the order), then you will be sure that the entropy is not lowered, because all the words are still independent. Other kinds of transformations will have potential issues and are likely to reduce the entropy, at least in theory. In practice though, it's difficult (maybe impossible) to say how much entropy you will lose, and if the entropy loss will be significant.