how is select() alerted to an fd becoming "ready"?

It reports that it's ready by returning.

select waits for events that are typically outside your program's control. In essence, by calling select, your program says "I have nothing to do until ..., please suspend my process".

The condition you specify is a set of events, any of which will wake you up.

For example, if you are downloading something, your loop would have to wait on new data to arrive, a timeout to occur if the transfer is stuck, or the user to interrupt, which is precisely what select does.

When you have multiple downloads, data arriving on any of the connections triggers activity in your program (you need to write the data to disk), so you'd give a list of all download connections to select in the list of file descriptors to watch for "read".

When you upload data to somewhere at the same time, you again use select to see whether the connection currently accepts data. If the other side is on dialup, it will acknowledge data only slowly, so your local send buffer is always full, and any attempt to write more data would block until buffer space is available, or fail. By passing the file descriptor we are sending to to select as a "write" descriptor, we get notified as soon as buffer space is available for sending.

The general idea is that your program becomes event-driven, i.e. it reacts to external events from a common message loop rather than performing sequential operations. You tell the kernel "this is the set of events for which I want to do something", and the kernel gives you a set of events that have occured. It is fairly common for two events occuring simultaneously; for example, a TCP acknowledge was included in a data packet, this can make the same fd both readable (data is available) and writeable (acknowledged data has been removed from send buffer), so you should be prepared to handle all of the events before calling select again.

One of the finer points is that select basically gives you a promise that one invocation of read or write will not block, without making any guarantee about the call itself. For example, if one byte of buffer space is available, you can attempt to write 10 bytes, and the kernel will come back and say "I have written 1 byte", so you should be prepared to handle this case as well. A typical approach is to have a buffer "data to be written to this fd", and as long as it is non-empty, the fd is added to the write set, and the "writeable" event is handled by attempting to write all the data currently in the buffer. If the buffer is empty afterwards, fine, if not, just wait on "writeable" again.

The "exceptional" set is seldom used -- it is used for protocols that have out-of-band data where it is possible for the data transfer to block, while other data needs to go through. If your program cannot currently accept data from a "readable" file descriptor (for example, you are downloading, and the disk is full), you do not want to include the descriptor in the "readable" set, because you cannot handle the event and select would immediately return if invoked again. If the receiver includes the fd in the "exceptional" set, and the sender asks its IP stack to send a packet with "urgent" data, the receiver is then woken up, and can decide to discard the unhandled data and resynchronize with the sender. The telnet protocol uses this, for example, for Ctrl-C handling. Unless you are designing a protocol that requires such a feature, you can easily leave this out with no harm.

Obligatory code example:

#include <sys/types.h>
#include <sys/select.h>

#include <unistd.h>

#include <stdbool.h>

static inline int max(int lhs, int rhs) {
    if(lhs > rhs)
        return lhs;
    else
        return rhs;
}

void copy(int from, int to) {
    char buffer[10];
    int readp = 0;
    int writep = 0;
    bool eof = false;
    for(;;) {
        fd_set readfds, writefds;
        FD_ZERO(&readfds);
        FD_ZERO(&writefds);

        int ravail, wavail;
        if(readp < writep) {
            ravail = writep - readp - 1;
            wavail = sizeof buffer - writep;
        }
        else {
            ravail = sizeof buffer - readp;
            wavail = readp - writep;
        }

        if(!eof && ravail)
            FD_SET(from, &readfds);
        if(wavail)
            FD_SET(to, &writefds);
        else if(eof)
            break;
        int rc = select(max(from,to)+1, &readfds, &writefds, NULL, NULL);
        if(rc == -1)
            break;
        if(FD_ISSET(from, &readfds))
        {
            ssize_t nread = read(from, &buffer[readp], ravail);
            if(nread < 1)
                eof = true;
            readp = readp + nread;
        }
        if(FD_ISSET(to, &writefds))
        {
            ssize_t nwritten = write(to, &buffer[writep], wavail);
            if(nwritten < 1)
                break;
            writep = writep + nwritten;
        }
        if(readp == sizeof buffer && writep != 0)
            readp = 0;
        if(writep == sizeof buffer)
            writep = 0;
    }
}

We attempt to read if we have buffer space available and there was no end-of-file or error on the read side, and we attempt to write if we have data in the buffer; if end-of-file is reached and the buffer is empty, then we are done.

This code will behave clearly suboptimal (it's example code), but you should be able to see that it is acceptable for the kernel to do less than we asked for both on reads and writes, in which case we just go back and say "whenever you're ready", and that we never read or write without asking whether it will block.


From the same man page:

On exit, the sets are modified in place to indicate which file descriptors actually changed status.

So use FD_ISSET() on the sets passed to select to determine which FDs have become ready.