C file pointer changing after fork and (failed) exec

Credit to Jonathan Leffler for pointing us in the right direction.

Although your program does not produce the same unexpected behavior for me on CentOS 7 / GCC 4.8.5 / GLIBC 2.17, it is plausible that you observe different behavior. Your program's behavior is in fact undefined according to POSIX (on which you rely for fork). Here are some excerpts from the relevant section (emphasis added):

An open file description may be accessed through a file descriptor, which is created using functions such as open() or pipe(), or through a stream, which is created using functions such as fopen() or popen(). Either a file descriptor or a stream is called a "handle" on the open file description to which it refers; an open file description may have several handles.

[...]

The result of function calls involving any one handle (the "active handle") is defined elsewhere in this volume of POSIX.1-2017, but if two or more handles are used, and any one of them is a stream, the application shall ensure that their actions are coordinated as described below. If this is not done, the result is undefined.

[...]

For a handle to become the active handle, the application shall ensure that the actions below are performed between the last use of the handle (the current active handle) and the first use of the second handle (the future active handle). The second handle then becomes the active handle. [...]

The handles need not be in the same process for these rules to apply.

Note that after a fork(), two handles exist where one existed before. The application shall ensure that, if both handles can ever be accessed, they are both in a state where the other could become the active handle first. [Where subject to the preceding qualification, the] application shall prepare for a fork() exactly as if it were a change of active handle. (If the only action performed by one of the processes is one of the exec functions or _exit() (not exit()), the handle is never accessed in that process.)

For the first handle, the first applicable condition below applies. [An impressively long list of alternatives that do not apply to the OP's situation ...]

  • If the stream is open with a mode that allows reading and the underlying open file description refers to a device that is capable of seeking, the application shall either perform an fflush(), or the stream shall be closed.

For the second handle:

  • If any previous active handle has been used by a function that explicitly changed the file offset, except as required above for the first handle, the application shall perform an lseek() or fseek() (as appropriate to the type of handle) to an appropriate location.

Thus, for the OP's program to access the same stream in both parent and child, POSIX demands that the parent fflush() stdin before forking, and that the child fseek() it after starting. Then, after waiting for the child to terminate, the parent must fseek() the stream. Given that we know the child's exec will fail, however, the requirement for all the flushing and seeking can be avoided by having the child use _exit() (which does not access the stream) instead of exit().

Complying with POSIX's provisions yields the following:

When these rules are followed, regardless of the sequence of handles used, implementations shall ensure that an application, even one consisting of several processes, shall yield correct results: no data shall be lost or duplicated when writing, and all data shall be written in order, except as requested by seeks.

It is worth noting, however, that

It is implementation-defined whether, and under what conditions, all input is seen exactly once.


I appreciate that it may be somewhat unsatisfying to hear merely that your expectations for program behavior are not justified by the relevant standards, but that's really all there is. The parent and child processes do have some relevant shared data in the form of a common open file description (with which they have separate handles associated), and that seems likely to be the vehicle for the unexpected (and undefined) behavior, but there's no basis for predicting the specific behavior you see, nor the different behavior I see for the same program.