Is "always use /dev/urandom" still good advice in an age of containers and isolation?

I wrote an answer which describes in detail how getrandom() blocks waiting for initial entropy.

However, I think that he slightly oversells urandom by saying that "the only instant where /dev/urandom might imply a security issue due to low entropy is during the first moments of a fresh, automated OS install."

Your worries are well-founded. I have an open question about that very thing and its implications. The issue is that the persistent random seed takes quite some time to move from the input pool to the output pool (the blocking pool and the CRNG). This issue means that /dev/urandom will output potentially predictable values for a few minutes after boot. The solution is, as you say, to use either the blocking /dev/random, or to use getrandom() set to block.

In fact, it is not uncommon to see lines like this in the kernel's log at early boot:

random: sn: uninitialized urandom read (4 bytes read, 7 bits of entropy available)
random: sn: uninitialized urandom read (4 bytes read, 15 bits of entropy available)
random: sn: uninitialized urandom read (4 bytes read, 16 bits of entropy available)
random: sn: uninitialized urandom read (4 bytes read, 16 bits of entropy available)
random: sn: uninitialized urandom read (4 bytes read, 20 bits of entropy available)

All of these are instances when the non-blocking pool was accessed even before enough entropy has been collected. The problem is that the amount of entropy is just too low to be sufficiently cryptographically secure at this point. There should be 232 possible 4 byte values, however with only 7 bits of entropy available, that means there are only 27, or 128, different possibilities.

Halderman also seems to say that the entropy pool fills on each boot, and not, as Pornin says in his answer, at the very first OS install. Although it's not terribly important for my application, I'm wondering: which is it?

It's actually a matter of semantics. The actual entropy pool (the page of memory kept in the kernel that contains random values) is filled on each boot by the persistent entropy seed and by environmental noise. However, the entropy seed itself is a file that is created at install time and is updated with new random values each time the system shuts down. I imagine Pornin is considering the random seed to be a part of the entropy pool (as in, a part of the general entropy-distributing and collecting system), whereas Halderman considers it to be separate (because the entropy pool is technically a page of memory, nothing more). The truth is that the entropy seed is fed into the entropy pool at each boot, but it can take a few minutes to actually affect the pool.

A summary of the three source of randomness:

  1. /dev/random - The blocking character device decrements an "entropy count" each time it is read (despite entropy not actually being depleted). However, it also blocks until sufficient entropy has been collected at boot, making it safe to use early on.

  2. /dev/urandom - The non-blocking character device will output random data whenever anyone reads from it. Once sufficient entropy has been collected, it will output a virtually unlimited stream indistinguishable from random data. Unfortunately, for compatibility reasons, it is readable even early on in boot before enough one-time entropy has been collected.

  3. getrandom() - A syscall that will output random data as long as the entropy pool has properly initialized with the minimum amount of entropy required. It defaults to reading from the non-blocking pool. If given the GRND_NONBLOCK flag, it will return an error if there is not enough entropy. If given the GRND_RANDOM flag, it will behave identically to /dev/random, simply blocking until there is entropy available.

I suggest you use the third option, the getrandom() syscall. This will allow a process to read cryptographically-secure random data at high speeds, and will only block early on in boot when not enough entropy has been gathered. If Python's os.urandom() function acts as a wrapper to this syscall as you say, then it should be fine to use. It looks like there was actually much discussion on whether or not that should be the case, ending up with it blocking until enough entropy is available.

Thinking just a bit further down the road: what are the best practices for environments which are as fresh and naive as I've described above, but which run on devices of fairly abysmal prospects for initial entropy generation?

This is a common situation, and there are a few ways to deal with it:

  • Ensure you block at early boot, for example by using /dev/random or getrandom().

  • Keep a persistent random seed, if possible (i.e. if you can write to storage at each boot).

  • Most importantly, use a hardware RNG. This is the #1 most effective measure.

Using a hardware random number generator is very important. The Linux kernel will initialize its entropy pool with any supported HWRNG interface if one exists, completely eliminating the boot entropy hole. Many embedded devices have their own randomness generators.

This is especially important for many embedded devices, since they may not have a high-resolution timer that is required for the kernel to securely generate entropy from environmental noise. Some versions of MIPS processors, for example, have no cycle counter.

How and why do you suggest using urandom to seed a (I guess userland?) CSPRNG? How does this beat getrandom?

The non-blocking randomness device is not designed for high performance. Until recently, the device was obscenely slow due to using SHA-1 for randomness rather than a stream cipher as it does now. Using a kernel interface for randomness can be less efficient than a local, userspace CSPRNG because each call to the kernel requires an expensive context switch. The kernel has been designed to account for applications that want to draw heavily from it, but the comments in the source code make it clear that they do not see this as the right thing to do:

 * Hack to deal with crazy userspace progams when they are all trying
 * to access /dev/urandom in parallel.  The programs are almost
 * certainly doing something terribly wrong, but we'll work around
 * their brain damage.

Popular crypto libraries such as OpenSSL support generating random data. They can be seeded once or reseeded occasionally, and are able to benefit more from parallelization. It additionally makes it possible to write portable code that does not rely on the behavior of any particular operating system or version of operating system.

If you do not need huge amounts of randomness, it is completely fine to use the kernel's interface. If you are developing a crypto application that will need a lot of randomness throughout its lifetime, you may want to use a library like OpenSSL to deal with that for you.

There are three states the system can be in:

  1. Hasn't collected enough entropy to safely initialize a CPRNG.
  2. Has collected enough entropy to safely initialize a CPRNG, and:

    2a. Has given out more entropy than it's collected.

    2b. Has given out less entropy than it's collected.

Historically, people thought the distinction between (2a) and (2b) was important. This caused two problems. First, it's wrong – the distinction is meaningless for a properly designed CPRNG. And second, the emphasis on the (2a)-vs-(2b) distinction caused people to miss the distinction between (1) and (2), which actually is really important. People just sort of collapsed (1) into being a special case of (2a).

What you really want is something that blocks in state (1), and doesn't block in states (2a) or (2b).

Unfortunately, in the old days, the confusion between (1) and (2a) meant that this wasn't an option. Your only two options were /dev/random, which blocked in cases (1) and (2a), and /dev/urandom, which never blocked. But state (1) almost never happens – and doesn't happen at all in well-configured systems, see below – then /dev/urandom is better for almost all systems, almost all the time. That's where all those blog posts about "always use urandom" came from – they were trying to convince people to stop making a meaningless and harmful distinction between the (2a) and (2b) states.

But, yeah, neither of these is what you actually want. Thus, the newer getrandom syscall, which by default blocks in state (1), and doesn't block in states (2a) or (2b). So on modern Linux, the orthodoxy should be updated to: always use getrandom with default settings.

Extra wrinkles:

  • getrandom also supports a non-default mode where it acts like /dev/random, which can be requested via the GRND_RANDOM flag. AFAIK this flag is never actually useful, for all the same reasons those old blog posts described. Don't use it.

  • getrandom also has some extra bonus benefits over /dev/urandom: it works regardless of your filesystem layout, and doesn't require opening a file descriptor, both of which are problematic for generic libraries that want to make minimal assumptions about the environment they'll be used in. This doesn't affect cryptographic security, but it's nice operationally.

  • A well-configured system will always have entropy available, even in early boot (i.e., you should really never get into state (1), ever). There are a lot of ways to manage this: save some entropy from the previous boot to use on the next one. Install a hardware RNG. Docker containers use the host's kernel, and thus get access to its entropy pool. High-quality virtualization setups have ways to let the guest system fetch entropy from the host system via hypervisor interfaces (e.g. search for "virtio rng"). But of course, not all systems are well-configured. If you have a poorly-configured system, you should see if you can make it well-configured instead. In principle this should be cheap in easy, but in reality people don't prioritize security so... it might require doing things like switching cloud providers, or switching to a different embedded platform. And unfortunately, you may find that this is more expensive than you (or your boss) are willing to pay, so you're stuck dealing with a poorly-configured system. My sympathies if so.

  • As @forest notes, if you need a lot of CPRNG values, then if you're very careful you can speed this up by running your own CPRNG in userspace, while using getrandom for (re)seeding. This is very much an "experts only" thing though, just like any situation where you find yourself implementing your own crypto primitives. You should only do it if you've measured and found that using getrandom directly is too slow for your needs and you have significant cryptographic expertise. It's very easy to screw up a CPRNG implementation in such a way that your security is totally broken, but the output still "looks" random so you don't notice.