How to continue cascaded pipeline commands after a failure

Tee (in Linux) has an option that ignores pipe failures.

a-command | tee --output-error=warn logfile.txt | myscript

When myscript fails or is killed, a-command continues to run and the log continues to grow.

You can rerun your script, and have it exit when it catches up the last complete block of the log:

myscript < logfile.txt

You can rerun your script, and have it wait for additions when it catches up.

tail -999999f < logfile.txt | myscript

A more complex example, contained in a Bash script.

logger represents your a-command. It generates 36 permutations of a short string, one per second. All the output is teed to 593580.log.

awk represents your "myscript". It prints a subset of the input.

wdog is my watchdog utility. -d 5 makes it debug its actions. -t 25 makes it timeout the process under control (the awk) after 25 seconds, with a SIGUSR1. This just saves me manually running a kill to simulate your script failure -- I like repeatable tests.

When the awk goes away, the cat in the same compound command gets to read the pipe, and copies the remaining data to a duplicate log. So you can re-run your script against the full log, or the unprocessed data only, and you can compare the two logs to find exactly where you crashed.

Alternatively, you can cat >/dev/null, just to keep the pipe alive so logger continues to run.

Both the logfile copies seem to be line-buffered: tail -f shows then in real-time.

The example script:

#! /bin/bash

logger () {

    for Q in {0..1}{A..C}{A..F}; do
        printf '%s\n' "${Q}"
        sleep 1
    done
}
    
AWK='
/C/ { printf ("awk %d %s\n", NR, $0); }
'

    logger | tee 593580.log | 
        { 
            date
            wdog -d 5 -t 25 awk "${AWK}"
            date
            cat > 593580.add 
            date
        }

The test run:

paul $ ./593580
Thu 18 Jun 15:35:24 BST 2020
wdog       25.000| Thu Jun 18 15:35:49.574 2020
wdog 15:35:24.574| Started awk as 14035
awk 3 0AC
wdog 15:35:29.579| Tick
awk 9 0BC
wdog 15:35:34.583| Tick
awk 13 0CA
awk 14 0CB
awk 15 0CC
wdog 15:35:39.586| Tick
awk 16 0CD
awk 17 0CE
awk 18 0CF
wdog 15:35:44.591| Tick
awk 21 1AC
wdog 15:35:49.579| Tick
wdog 15:35:49.579| Timed out child 14035 with signal 10
wdog 15:35:49.580| Child 14035 terminated with signal 10
Thu 18 Jun 15:35:49 BST 2020
Thu 18 Jun 15:36:00 BST 2020
paul $ 

a-command | tee logfile.txt | { myscript; cat >/dev/null; }

This would run your pipeline as usual at first, until myscript terminates (for whatever reason). At that point, cat would take over reading from tee until there in no more data arriving. The data read by cat is discarded by piping it to /dev/null.

If a-command finishes without myscript ending/failing first, myscript would fail to read more data and would presumably terminate (?). At the point when myscript terminates, cat is started, but as there is no more data to read, it would immediately terminate and the pipeline would be done.


Addressing TooTea's comment about making sure that we still get the correct exit status for the pipeline:

a-command | tee logfile.txt | ( myscript; err=$?; cat >/dev/null; exit "$err" )