How do I recover a semaphore when the process that decremented it to zero crashes?

Turns out there isn't a way to reliably recover the semaphore. Sure, anyone can post_sem() to the named semaphore to get the count to increase past zero again, but how to tell when such a recovery is needed? The API provided is too limited and doesn't indicate in any way when this has happened.

Beware of the ipc tools also available -- the common tools ipcmk, ipcrm, and ipcs are only for the outdated SysV semaphores. They specifically do not work with the new POSIX semaphores.

But it looks like there are other things that can be used to lock things, which the operating system does automatically release when an application dies in a way that cannot be caught in a signal handler. Two examples: a listening socket bound to a particular port, or a lock on a specific file.

I decided the lock on a file is the solution I needed. So instead of a sem_wait() and sem_post() call, I'm using:

lockf( fd, F_LOCK, 0 )

and

lockf( fd, F_ULOCK, 0 )

When the application exits in any way, the file is automatically closed which also releases the file lock. Other client apps waiting for the "semaphore" are then free to proceed as expected.

Thanks for the help, guys.


Use a lock file instead of a semaphore, much like @Stéphane's solution but without the flock() calls. You can simply open the file using an exclusive lock:

//call to open() will block until it can obtain an exclusive lock on the file.
errno = 0;
int fd = open("/tmp/.lockfile", 
    O_CREAT | //create the file if it's not present.
    O_WRONLY | //only need write access for the internal locking semantics.
    O_EXLOCK, //use an exclusive lock when opening the file.
    S_IRUSR | S_IWUSR); //permissions on the file, 600 here.

if (fd == -1) {
    perror("open() failed");
    exit(EXIT_FAILURE);
}

printf("Entered critical section.\n);
//Do "critical" stuff here.

//exit the critical section
errno = 0;
if (close(fd) == -1) {
    perror("close() failed");
    exit(EXIT_FAILURE);
}

printf("Exited critical section.\n");

This is a typical problem when managing semaphores. Some programs use a single process to manage the initialization/deletion of the semaphore. Usually this process does just this and nothing else. Your other applications can wait until the semaphore is available. I've seen this done with the SYSV type API, but not with POSIX. Similar to what 'Duck' mentioned, using the SEM_UNDO flag in your semop() call.


But, with the information that you've provided I would suggest that you do not to use semaphores. Especially if your process is in danger of being killed or crashing. Try to use something that the OS will cleanup automagically for you.