Simple way of restarting crashed processes?

Solution 1:

I'd look in to daemontools (http://cr.yp.to/daemontools.html).

Supervise was built for exactly this purpose -- to start processes and watch them, restarting them immediately if they ever terminate.

You could still use monit if you need to do anything more complicated than a simple "is it still running" check, and if the process needs to be restarted, then do that through supervise.

Solution 2:

You could also use /etc/inittab to restart dead processes using the respawn action.

See inittab section on http://aplawrence.com/Unixart/startup.html


Solution 3:

You can use event handler scripts with Nagios if you have that in place to restart services.

If varnish requires root permission to start (init.d scripts usually do) change "/etc/init.d/varnish start" to "sudo /etc/init.d/varnish start". But that probably won't be quite enough since you probably don't want to give whatever user monit runs as total sudo nopasswd privileges to all commands and giving sudo to a shell script would be basically just as bad. So you are going to need to figure out which commands in that init script need sudo, give those commands sudo privileges in the /etc/sudoers file to the monit user, and the finally edit that init script accordingly. Or maybe instead of all this varnish can be run as non-root user?

Finally, I am sure you know this but I am going to say it anyways. You are clearly putting a lot of effort into this, I hope you are putting as much effort into figuring out why varnish is crashing and actually fixing it (or hounding the developers to figure out why) :-)

Update:
This might not be as clean, but an easy way to get this done as root might be to set up a script that checks if the process is okay, and if not starts it. Then just run that script every couple minutes as a cron job.


Solution 4:

Another great method taken from StackOverflow:

until myserver; do
    echo "Server 'myserver' crashed with exit code $?.  Respawning.." >&2
    sleep 1
done

This could be added to the crontab:

crontab -e

Then add a rule to start your monitor script:

@reboot /usr/local/bin/myservermonitor

Or added as a script in /etc/init.d

See the StackOverflow answer for a detailed explanation on why this is a good approach.