How does sleep() work?

We can see the sleeping operation from a more abstract point of view: it is an operation that let you wait for an event.
The event in question is triggered when the time passed from sleep invocation exceeds the sleep parameter.

When a process is active (ie: it owns a CPU) it can wait for an event in an active or in a passive way:

  • An active wait is when a process actively/explicitly waits for the event:

    sleep( t ):
        while not [event: elapsedTime > t ]:
            NOP // no operatior - do nothing
    

    This is a trivial algorithm and can be implemented wherever in a portable way, but has the issue that while your process is actively waiting it still owns the CPU, wasting it (since your process doesn't really need the CPU, while other tasks could need it).

    Usually this should be done only by those process that cannot passively wait (see the point below).

  • A passive wait instead is done by asking to something else to wake you up when the event happens, and suspending yourself (ie: releasing the CPU):

    sleep( t ):
        system.wakeMeUpWhen( [event: elapsedTime > t ] )
        release CPU
    

    In order to implement a passive wait you need some external support: you must be able to release your CPU and to ask somebody else to wake you up when the event happens.

    This could be not possible on single-task devices (like many embedded devices) unless the hardware provides a wakeMeUpWhen operation, since there's nobody to release the CPU to or to ask to been waken up.

    x86 processors (and most others) offer a HLT operation that lets the CPU sleep until an external interrupt is triggered. This way also operating system kernels can sleep in order to keep the CPU cool.


Modern operating systems are multitasking, which means it appears to run multiple programs simultaneously. In fact, your computer only (traditionally, at least) only has one CPU, so it can only execute one instruction from one program at the same time.

The way the OS makes it appear that multiple stuff (you're browsing the web, listening to music and downloading files) is happening at once is by executing each task for a very short time (let's say 10 ms). This fast switching makes it appear that stuff is happening simultaneously when everything is in fact happening sequentially. (with obvious differences for multi-core system).

As for the answer to the question: with sleep or wait or synchronous IO, the program is basically telling the OS to execute other tasks, and do not run me again until: X ms has elapsed, the event has been signaled, or the data is ready.