What really is EOF for binary files? Condition? Character?

This should clear it up nicely for you.

Basically, EOF is just a macro with a pre-defined value representing the error code from I/O functions indicating that there is no more data to be read.


The file doesn't actually contain an EOF. EOF isn't a character of sorts - remember a byte can be between 0 and 255, so it wouldn't make sense if a file could contain a -1. The EOF is a signal from the operating system that you're using, which indicates the end of the file has been reached. Notice how getc() returns an int - that is so it can return that -1 to tell you the stream has reached the end of the file.

The EOF signal is treated the same for binary and text files - the actual definition of binary and text stream varies between the OSes (for example on *nix binary and text mode are the same thing.) Either way, as stated above, it is not part of the file itself. The OS passes it to getc() to tell the program that the end of the stream has been reached.

From From the GNU C library:

This macro is an integer value that is returned by a number of narrow stream functions to indicate an end-of-file condition, or some other error situation. With the GNU C Library, EOF is -1. In other libraries, its value may be some other negative number.


The various EOF indicators that C provides to you do not necessarily have anything to do with how the file system marks the end of a file.

Most modern file systems know the length of a file because they record it somewhere, separately from the contents of the file. The routines that read the file keep track of where you are reading and they stop when you reach the end. The C library routines generate an EOF value to return to you; they are not returning a value that is actually in the file.

Note that the EOF returned by C library routines is not actually a character. The C library routines generally return an int, and that int is either a character value or an EOF. E.g., in one implementation, the characters might have values from 0 to 255, and EOF might have the value −1. When the library routine encountered the end of the file, it did not actually see a −1 character, because there is no such character. Instead, it was told by the underlying system routine that the end of file had been reached, and it responded by returning −1 to you.

Old and crude file systems might have a value in the file that marks the end of file. For various reasons, this is usually undesirable. In its simplest implementation, it makes it impossible to store arbitrary data in the file, because you cannot store the end-of-file marker as data. One could, however, have an implementation in which the raw data in the file contains something that indicates the end of file, but data is transformed when reading or writing so that arbitrary data can be stored. (E.g., by “quoting” the end-of-file marker.)

In certain cases, things like end-of-file markers also appear in streams. This is common when reading from the terminal (or a pseudo-terminal or terminal-like device). On Windows, pressing control-Z is an indication that the user is done entering input, and it is treated similarly to reach an end-of-file. This does not mean that control-Z is an EOF. The software reading from the terminal sees control-Z, treats it as end-of-file, and returns end-of-file indications, which are likely different from control-Z. On Unix, control-D is commonly a similar sentinel marking the end of input.

Tags:

C

Binaryfiles

Eof