Why can the waitpid system call only be used with child processes?

Because of how waitpid works. On a POSIX system, a signal (SIGCHLD) is delivered to a parent process when one of its child processes dies. At a high level, all waitpid is doing is blocking until a SIGCHLD signal is delivered for the process (or one of the processes) specified. You can't wait on arbitrary processes, because the SIGCHLD signal would never be delivered for them.

godlygeek's answer is good for understanding how the system works but the subsequent question that inevitably follows is:

How to determine if a process has gone away?

The correct way to wait on a process in another process group or session is to use kill(). Obviously, that is an unintuitive answer. You can't use the wait family of functions because the SIGCHILD signal won't ever be passed to your process nor can you get the status code. kill(), however, can tell you when a specific process has gone away by passing in 0 for the signal to send, which simply checks if a signal can be sent to the process. The return value of kill() is complex but can be boiled down to this: A value of 0 means the process is alive and would accept signals from your process while a value of -1 and errno EPERM means the process is alive but not accepting signals from your process.

Some sample C code that checks once per second to see if an arbitrary process is gone:

int res = kill(pid, 0);
while (res == 0 || (res < 0 && errno == EPERM))
{
    sleep(1);

    res = kill(pid, 0);
}

You can similarly experiment with the kill command:

kill -0 <pid>

That will pass pid and 0 into kill(). Some shells have a built-in kill, so it's much more efficient than starting a new process (e.g. ps).

Why can the waitpid system call only be used with child processes?

Tags:

Architecture

Process

System Calls

Linux Kernel

Related

Recent Posts