What makes Random Number Generators so fragile?

Hardware vs software RNGs

The first thing you mention is a hardware noise source. High-precision measurement of some metastable phenomenon is enough to generate unpredictable data. This can be done with a reverse-biased zener diode, with ring oscillators, with an ADC, or even with a Geiger counter. It can even be done my measuring nanosecond-level delays in the timing between keystrokes. These noise sources can fail if the hardware itself begins to fail. For example, a transistor can break down if it is not specifically designed to operate in reverse at high voltages. While these techniques have varying levels of fragility, it is not what is being discussed in the text you quoted.

The second type of RNG you mention is a software RNG called a pseudorandom number generator (PRNG*). This is an algorithm which takes a seed, which is like an encryption key, and expands it into an endless stream of data. It attempts to ensure that the data cannot be predicted, or told apart from pure randomness, without knowledge of the secret random seed that the algorithm started with. In this case, the PRNG is implemented in pure software, so breaking it only takes introducing a bug into the code, which is what the text you quoted is talking about. It is merely code that is fragile, risking complete failure if changes to the code are made that deviate from the algorithm's intended behavior.

A PRNG can be thought of as a re-purposed encryption algorithm. In fact, you can create a cryptographically secure PRNG by using a cipher like AES to encrypt a counter. As long as the encryption key (seed) is secret, the output cannot be predicted and the seed cannot be discovered. When you think about it this way, it becomes easier to understand how a small, inconsequential change in the code can completely break the security of the algorithm.

Collecting randomness

So how do modern devices actually collect randomness? Let's take a server running quietly in a datacenter somewhere. In order to support things like TLS, it needs a large amount of completely unpredictable data that cannot be distinguished from a truly random stream. Without a dedicated hardware noise source, the randomness must come from within. Computers strive to be fully deterministic, but they have plenty of input from non-deterministic devices. Enter... interrupts!

In modern hardware, an interrupt is a signal emitted by hardware to alert the CPU to a status change. It allows the CPU to avoid rapidly polling every hardware device for updates and instead trust that the device will asynchronously alert it when the time comes. When an interrupt occurs, an interrupt handler is called to process the signal. It turns out this handler is the perfect place to get randomness! When you measure the nanosecond-level timing of interrupts, you can quickly get a fair bit of randomness. This is because interrupts are triggered for all sorts of things, from packets arriving on the NIC to data being read from a hard drive. Some of these interrupt sources are highly non-deterministic, like a hard drive which relies on the physical motion of an actuator.

Once sufficient random bits have been collected by the operating system, a small seed of at least 128 bits can be fed into a cryptographically secure PRNG to generate an unlimited stream of pseudorandom data. Unless someone could predict exactly when every past interrupt occurred, to nanosecond precision, they will not be able to derive the seed and will not be able to predict future PRNG output. This makes the output completely suitable for TLS keys.

* A security-oriented PRNG is called a cryptographically-secure PRNG, or CSPRNG. Using a regular PRNG when an application calls for a CSPRNG can result in security vulnerabilities.


Historically, hardware RNGs suitable for cryptography weren't commonly available on the PC: for example, according to this question AMD only added support a few years ago, so even today a software vendor can't simply assume that it will be available. That's presumably why OpenSSL (as discussed in your quote) was using a software RNG, making it vulnerable to the bug found in the code.

(As extensively discussed in the comments, a standard PC does contain a number of "sources of entropy" that a software RNG can make use of - and I believe OpenSSL does, though I'm not terribly familiar with it - but obviously in that scenario a bug in the software can result in bad random numbers, as indeed happened.)

There are also concerns that hardware RNGs might have been backdoored, leading people to combine hardware RNGs with other sources of entropy rather than using them as-is. (Backdoored hardware is also mentioned in your linked article, a few paragraphs up from the bit you've quoted.)

It should also be mentioned that hardware RNGs aren't nearly as simple to implement as your question suggests ... for one thing, naive implementations might be vulnerable to various physical attacks, for example, if you're generating random bits based on vibrations, what happens if someone aims an ultrasound at it? Even under ideal conditions, there's likely to be some sort of bias in the results that could make the generated bits unsafe for cryptographic use.

That's why real-world implementations use hardware noise but also process it cryptographically. But at that point you're back to the question of whether the algorithm (or its implementation) has been deliberately sabotaged, or perhaps just isn't as robust as believed.


Because they are difficult to test

While it's easy to test that a random number generator produces output with the right format, determining whether it's statistically random is much more involved and unlikely to be included in an automated test suite. A lot of other code will be much more obvious if you break it.

Crypto needs to be right

In general, making sure that code is correct is difficult. A saving grace for a lot of code is that only a small proportion of correctness errors result in security vulnerabilities. But with cryptographic code - including random number generators - many correctness errors will result in vulnerabilities. Crypto code needs to be correct, to be secure, and ensuring it's correct is hard.

The Debian maintainer made a major error

The code is not actually that fragile. For it to be made insecure required major failings from the maintainer. To just chop out lines that produce warnings with only cursory checks that it's not broken anything, is pretty shoddy.

Edit: it was not just the maintainer's fault, see Angel's comment