Why does set -e not work inside subshells with parenthesis () followed by an OR list ||?

According to this thread, it's the behavior POSIX specifies for using "set -e" in a subshell.

(I was surprised as well.)

First, the behavior:

The -e setting shall be ignored when executing the compound list following the while, until, if, or elif reserved word, a pipeline beginning with the ! reserved word, or any command of an AND-OR list other than the last.

The second post notes,

In summary, shouldn't set -e in (subshell code) operate independently of the surrounding context?

No. The POSIX description is clear that surrounding context affects whether set -e is ignored in a subshell.

There's a little more in the fourth post, also by Eric Blake,

Point 3 is not requiring subshells to override the contexts where set -e is ignored. That is, once you are in a context where -e is ignored, there is nothing you can do to get -e obeyed again, not even a subshell.

$ bash -c 'set -e; if (set -e; false; echo hi); then :; fi; echo $?' 
hi 
0 

Even though we called set -e twice (both in the parent and in the subshell), the fact that the subshell exists in a context where -e is ignored (the condition of an if statement), there is nothing we can do in the subshell to re-enable -e.

This behavior is definitely surprising. It is counter-intuitive: one would expect the re-enabling of set -e to have an effect, and that the surrounding context would not take precedent; further, the wording of the POSIX standard does not make this particularly clear. If you read it in the context where the command is failing, the rule does not apply: it only applies in the surrounding context, however, it applies to it completely.


Indeed, set -e has no effect inside subshells if you use || operator after them; e.g., this wouldn't work:

#!/bin/sh

# prints:
#
# --> outer
# --> inner
# ./so_1.sh: line 16: some_failed_command: command not found
# <-- inner
# <-- outer

set -e

outer() {
  echo '--> outer'
  (inner) || {
    exit_code=$?
    echo '--> cleanup'
    return $exit_code
  }
  echo '<-- outer'
}

inner() {
  set -e
  echo '--> inner'
  some_failed_command
  echo '<-- inner'
}

outer

Aaron D. Marasco in his answer does a great job of explaining why it behaves this way.

Here is a little trick that can be used to fix this: run the inner command in background, and then immediately wait for it. The wait builtin will return the exit code of the inner command, and now you're using || after wait, not the inner function, so set -e works properly inside the latter:

#!/bin/sh

# prints:
#
# --> outer
# --> inner
# ./so_2.sh: line 27: some_failed_command: command not found
# --> cleanup

set -e

outer() {
  echo '--> outer'
  inner &
  wait $! || {
    exit_code=$?
    echo '--> cleanup'
    return $exit_code
  }
  echo '<-- outer'
}

inner() {
  set -e
  echo '--> inner'
  some_failed_command
  echo '<-- inner'
}

outer

Here is the generic function that builds upon this idea. It should work in all POSIX-compatible shells if you remove local keywords, i.e. replace all local x=y with just x=y:

# [CLEANUP=cleanup_cmd] run cmd [args...]
#
# `cmd` and `args...` A command to run and its arguments.
#
# `cleanup_cmd` A command that is called after cmd has exited,
# and gets passed the same arguments as cmd. Additionally, the
# following environment variables are available to that command:
#
# - `RUN_CMD` contains the `cmd` that was passed to `run`;
# - `RUN_EXIT_CODE` contains the exit code of the command.
#
# If `cleanup_cmd` is set, `run` will return the exit code of that
# command. Otherwise, it will return the exit code of `cmd`.
#
run() {
  local cmd="$1"; shift
  local exit_code=0

  local e_was_set=1; if ! is_shell_attribute_set e; then
    set -e
    e_was_set=0
  fi

  "$cmd" "$@" &

  wait $! || {
    exit_code=$?
  }

  if [ "$e_was_set" = 0 ] && is_shell_attribute_set e; then
    set +e
  fi

  if [ -n "$CLEANUP" ]; then
    RUN_CMD="$cmd" RUN_EXIT_CODE="$exit_code" "$CLEANUP" "$@"
    return $?
  fi

  return $exit_code
}


is_shell_attribute_set() { # attribute, like "x"
  case "$-" in
    *"$1"*) return 0 ;;
    *)    return 1 ;;
  esac
}

Example of usage:

#!/bin/sh
set -e

# Source the file with the definition of `run` (previous code snippet).
# Alternatively, you may paste that code directly here and comment the next line.
. ./utils.sh


main() {
  echo "--> main: $@"
  CLEANUP=cleanup run inner "$@"
  echo "<-- main"
}


inner() {
  echo "--> inner: $@"
  sleep 0.5; if [ "$1" = 'fail' ]; then
    oh_my_god_look_at_this
  fi
  echo "<-- inner"
}


cleanup() {
  echo "--> cleanup: $@"
  echo "    RUN_CMD = '$RUN_CMD'"
  echo "    RUN_EXIT_CODE = $RUN_EXIT_CODE"
  sleep 0.3
  echo '<-- cleanup'
  return $RUN_EXIT_CODE
}

main "$@"

Running the example:

$ ./so_3 fail; echo "exit code: $?"

--> main: fail
--> inner: fail
./so_3: line 15: oh_my_god_look_at_this: command not found
--> cleanup: fail
    RUN_CMD = 'inner'
    RUN_EXIT_CODE = 127
<-- cleanup
exit code: 127

$ ./so_3 pass; echo "exit code: $?"

--> main: pass
--> inner: pass
<-- inner
--> cleanup: pass
    RUN_CMD = 'inner'
    RUN_EXIT_CODE = 0
<-- cleanup
<-- main
exit code: 0

The only thing that you need to be aware of when using this method is that all modifications of Shell variables done from the command you pass to run will not propagate to the calling function, because the command runs in a subshell.


Workaround when usint toplevel set -e

I came to this question because I was using set -e as an error detection method:

/usr/bin/env bash
set -e
do_stuff
( take_best_sub_action_1; take_best_sub_action_2 ) || do_worse_fallback
do_more_stuff

and without ||, the script would stop running and never reach do_more_stuff.

Since there seems to be no clean solution, I think I will be just doing a simple set +e on my scripts:

/usr/bin/env bash
set -e
do_stuff
set +e
( take_best_sub_action_1; take_best_sub_action_2 )
exit_status=$?
set -e
if [ "$exit_status" -ne 0 ]; then
  do_worse_fallback
fi
do_more_stuff