Shell command/script to see if a host is alive?
ping is the way to test whether a host is alive and connected. (If a host is alive but disconnected or slow to respond, you can't distinguish that from its being dead.)
Options supported by the
ping command vary from system to system. You'll want to ensure that it doesn't loop forever but returns after a few seconds if it didn't receive a reply.
With FreeBSD and Linux iputils,
ping -c 1 -W 1 >/dev/null sends a single ping and wait 1 second. You don't need to parse the output: the command returns 0 if it received a ping back and nonzero otherwise (unknown host name, no route to host, no reply). Some implementations may need different flags (e.g.
-w instead of
-W on FreeBSD), check the manual on your system.
if ping -c 1 -W 1 "$hostname_or_ip_address"; then echo "$hostname_or_ip_address is alive" else echo "$hostname_or_ip_address is pining for the fjords" fi
Ping is great to get a quick response about whether the host is connected to the network, but it often won't tell you whether the host is alive or not, or whether it's still operating as expected. This is because ping responses are usually handled by the kernel, so even if every application on the system has crashed (e.g. due to a disk failure or running out of memory), you'll often still get ping responses and may assume the machine is operating normally when the situation is quite the opposite.
Usually you don't really care whether a host is still online or not, what you really care about is whether the machine is still performing some task. So if you can check the task directly then you'll know the host is both up and that the task is still running.
For a remote host that runs a web server for example, you can do something like this:
# Add the -f option to curl if server errors like HTTP 404 should fail too if curl -I "http://$TARGET"; then echo "$TARGET alive and web site is up" else echo "$TARGET offline or web server problem" fi
If it runs SSH and you have keys set up for passwordless login, then you have a few more options, for example:
if ssh "$TARGET" true; then echo "$TARGET alive and accessible via SSH" else echo "$TARGET offline or not accepting SSH logins" fi
This works by SSH'ing into the host and running the
true command and then closing the connection. The
ssh command will only return success if that command could be run successfully.
Remote tests via SSH
You can extend this to check for specific processes, such as ensuring that
mysqld is running on the machine:
if ssh "$TARGET" bash -c 'ps aux | grep -q mysqld'; then echo "$TARGET alive and running MySQL" else echo "$TARGET offline or MySQL crashed" fi
Of course in this case you'd be better off running something like
monit on the target to ensure the service is kept running, but it's useful in scripts where you only want to perform some task on machine A as long as machine B is ready for it.
This could be something like checking that the target machine has a certain filesystem mounted before performing an
rsync to it, so that you don't accidentally fill up its main disk if a secondary filesystem didn't mount for some reason. For example this will make sure that
/mnt/raid is mounted on the target machine before continuing.
if ssh "$TARGET" bash -c 'mount | grep -q /mnt/raid'; then echo "$TARGET alive and filesystem ready to receive data" else echo "$TARGET offline or filesystem not mounted" fi
Services with no client
Sometimes there is no easy way to connect to the service and you just want to see whether it accepts incoming TCP connections, but when you
telnet to the target on the port in question it just sits there and doesn't disconnect you, which means doing that in a script would cause it to hang.
While not quite so clean, you can still do this with the help of the
netcat programs. For example this checks to see whether the machine accepts SMB/CIFS connections on TCP port 445, so you can see whether it is running Windows file sharing even if you don't have a password to log in, or the CIFS client tools aren't installed:
# Wait 1 second to connect (-w 1) and if the total time (DNS lookups + connect # time) reaches 5 seconds, assume the connection was successful and the remote # host is waiting for us to send data. Connecting on TCP port 445. if echo 'x' | timeout --preserve-status 5 nc -w 1 "$TARGET" 445; then echo "$TARGET alive and CIFS service available" else echo "$TARGET offline or CIFS unavailable" fi