Why do I get different exit status for ps | grep in a script?

In general, it's a bad idea to try the simple approach with ps and grep to try to determine if a given process is running.

You would be much better off using pgrep for this:

if pgrep "varnish" >/dev/null; then
  echo "Varnish in running"
  echo "Varnish is not running"

See the manual for pgrep. On some systems (probably not on Linux), you get a -q flag that corresponds to the same flag for grep which gets rid of the need to redirect to /dev/null. There's also a -f flag that performs the match on the full command line rather than on only the process name. One may also limit the match to processes belonging to a specific user using -u.

Installing pgrep also gives you access to pkill which allows you to signal processes based on their names.

Also, if this is a service daemon, and if your Unix system has a way of querying it for information (e.g., whether it's up and running or not), then that is the proper way of checking on it.

On Linux, you have systemctl (systemctl is-active --quiet varnish will return 0 if it's running, 3 otherwise), on OpenBSD you have rcctl, etc.

Now to your script:

In your script, you parse the output from ps ax. This output will contain the name of the script itself, check_varnish_pro.sh, which obviously contains the string varnish. This gives you a false positive. You would have spotted this if you had run it without the -q flag for grep while testing.

ps ax | grep '[v]arnish'

Running it:

$ ./check_varnish_pro.sh
31004 p1  SN+     0:00.04 /bin/bash ./check_varnish_pro.sh

Another issue is that although you try to "hide" the grep process from being detected by grep itself by using [v] in the pattern. That approach will fail if you happen to run the script or the command line in a directory that has a file or directory named varnish in it (in which case you will get a false positive, again). This is because the pattern is unquoted and the shell will perform filename globbing with it.


bash-4.4$ set -x
bash-4.4$ ps ax | grep [v]arnish
+ ps ax
+ grep '[v]arnish'
bash-4.4$ touch varnish
+ touch varnish
bash-4.4$ ps ax | grep [v]arnish
+ ps ax
+ grep varnish
91829 p2  SN+p    0:00.02 grep varnish

The presence of the file varnish will cause the shell to replace [v]arnish with the filename varnish and you get a hit on the pattern in the process table (the grep process).

When you run a script named check_varnish_pro.sh the test

ps ax  | grep -q [v]arnish

is successful because there is a script named check_varnish_pro running.

@AlexP explains very succinctly what is actually happening, but @Kusalananda's idea of using pgrep/pkill for a critical process is strongly discouraged. Better solutions include:

  • Asking the service whether it's running. systemctl status varnishd should take care of that on a modern *nix installation.
  • If by some unfortunate circumstance you don't have a service available you can simply change the startup script to report the problem as soon as the process exits:

    varnish || true
  • Alternatively change the script that starts the service to record the PID, and then check the state periodically with kill -0 "$pid".