Are texted 2FA security codes deliberately easy to remember?

I have noticed this too, and I think it is a result of the human brain's tendency to apply patterns to random noise. This seems to be more common when specifically trying to remember a string of numbers.


Roughly 85% of six digit random numbers will have at least one repeating digit and 40% will have a repeating sequential digit next to each other. (I am happy to be corrected on my math.)

These keys are generated using the standard TOTP algorithm. The article summarizes this implementation, showing there isn't any effort to generate a memorable number:

According to RFC 6238, the reference implementation is as follows:

  • Generate a key, K, which is an arbitrary byte string, and share it securely with the client.
  • Agree upon a T0, the Unix time to start counting time steps from, and an interval, TI, which will be used to calculate the value of the counter C (defaults are the Unix epoch as T0 and 30 seconds as TI)
  • Agree upon a cryptographic hash method (default is SHA-1)
  • Agree upon a token length, N (default is 6)

Although RFC 6238 allows different parameters to be used, the Google implementation of the authenticator app does not support T0, TI values, hash methods and token lengths different from the default. It also expects the K secret key to be entered (or supplied in a QR code) in base-32 encoding according to RFC 3548.

Once the parameters are agreed upon, token generation is as follows:

  1. Calculate C as the number of times TI has elapsed after T0.
  2. Compute the HMAC hash H with C as the message and K as the key (the HMAC algorithm is defined in the previous section, but also most cryptographical libraries support it). K should be passed as it is, C should be passed as a raw 64-bit unsigned integer.
  3. Take the least 4 significant bits of H and use it as an offset, O.
  4. Take 4 bytes from H starting at O bytes MSB, discard the most significant bit and store the rest as an (unsigned) 32-bit integer, I.
  5. The token is the lowest N digits of I in base 10. If the result has fewer digits than N, pad it with zeroes from the left.

Both the server and the client compute the token, then the server checks if the token supplied by the client matches the locally generated token. Some servers allow codes that should have been generated before or after the current time in order to account for slight clock skews, network latency and user delays.


On my phone I had around 90 verification codes from various companies. 62 of these were 6 digits long. Here's the count of each digit:

Possibly a slight skew towards 1,8 and 9? Almost certainly just noise in the data (62 is a small sample).

What about double digits?

enter image description here The first graph is only the double digits on the 2-digit boundaries (i.e. AABBCC) - so we'd expect each pair to appear around 1.86 times across the 186 possible digit placements. The second is any placement (i.e. XXX99X counts as a double digit). We'd expect each pair around 3.1 times across the 310 placements.

There doesn't seem to be any obvious skew with lots more double digits than non double - double digits are shown in orange. In the latter data, we would expect around 31 double digits, and we get 27. That seems reasonable.

Of course, this doesn't rule out other "non random" patterns - but to be honest humans are likely to be searching for patterns - look at these numbers, all taken from my 2FA app: 365 595, 111 216, 566 272, 468 694, 191 574, 833 043.

Tags:

Multi Factor