What is an open file description?

    file descriptor → open file description → directory entry
               dup                open                    cp

There are several levels of indirection when going from an open file in a process all the way to the file content. Implementation-wise, these levels generally translate into data structures in the kernel pointing to the next level. I'm going to describe a straightforward implementation; real implementations are likely to have a lot more complications.

An open file in a process is designated by a file descriptor, which is a small nonnegative integer. The numbers 0, 1 and 2 have conventional meanings: processes are supposed to read normal input from 0 (standard input), write normal output to 1 (standard output), and write error messages to 2 (standard error). This is only a convention: the kernel doesn't care. The kernel keeps a table of open file descriptors for each process, mapping these small integers to a file descriptor structure. In the Linux kernel, this structure is struct fd.

The file descriptor structure contains a pointer to an open file description. There can be multiple file descriptors pointing to the same open file description, from multiple processes, for example when a process has called dup and friends, or after a process has forked. If file descriptors (even in different processes) are due to the same original open (or similar) system call, they share the same open file description. The open file description contains information about the way the file is open, including the mode (read-only vs read-write, append, etc.), the position in the file, etc. Under Linux, the open file description structure is struct file.

The open file description lives at the level of the file API. The next level is in the filesystem API. The distinction is that the file API covers files such as anonymous pipes and sockets that do not live in the filesystem tree. If the file is a file in the directory tree, then the open file description contains a pointer to a directory entry. There can be multiple open file descriptions pointing to the same directory entry, if the same file was opened more than once. The directory entry contains information about what the file is, including a pointer to its parent directory, and information as to where the file is located. In the Linux kernel, the directory entry is split in two levels: struct inode which contains file metadata and struct dentry which keep track of where the file is in the directory tree.


I'm interpreting the question as mainly about terminology, specifically the "file table".

If you look at early implementations, the set of all open file descriptions in the system was an array. When a process needed a new open file description, the array was scanned for an unused slot and a pointer to that slot was returned. See for example falloc at the bottom of http://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/sys/sys/fio.c

In that system, "file table" is a natural name for the system-wide array of struct file.

Nowadays, open file descriptions are allocated dynamically with a more flexible mechanism than just choosing an unused slot in a fixed-size array. The set of all open file descriptions in the system is not required to be arranged in a contiguous array-like setup. So there really isn't a "file table" anymore, unless you consider every dynamic memory allocation pool to be a "table".

The "file table" in the diagram on wikipedia is a set of open file descriptions. A file descriptor is an index into an array of pointers to open file descriptions. Since the open file descriptions are always accessed through those pointers, never by numerical index in some array, drawing them as a contiguous column of boxes is a little misleading. And calling it a "table" reinforces that misleading image.

But it's a fairly common usage so I don't expect it to die out soon.