Emptying a file without disrupting the pipe writing to it

Another form of this problem occurs with long running applications whose logs are periodically rotated. Even if you move the original log (e.g., mv log.txt log.1) and replace it immediately with a file of the same name before any actual logging occurs, if the process is holding the file open, it will either end up writing to log.1 (because that may still be the open inode) or to nothing.

A common way to deal with this (the system logger itself works this way) is to implement a signal handler in the process which will close and reopen its logs. Then, when ever you want to move or clear (by deleting) the log, send that signal to the process immediately afterward.

Here's a simple demonstration for bash -- forgive my cruddy shell skills (but if you are going to edit this for best practices, etc., please make sure you understand the functionality first and test your revision before you edit):

#!/bin/bash

trap sighandler INT

function sighandler () {
    touch log.txt
    exec &> log.txt
}

echo $BASHPID
exec &> log.txt

count=0;
while [ $count -lt 60 ]; do
    echo "$BASHPID Count is now $count"
    sleep 2
    ((count++))
done          

Start this by forking into the background:

> ./test.sh &
12356

Notice it reports its PID to the terminal and then begins logging to log.txt. You now have 2 minutes to play around. Wait a few seconds and try:

> mv log.txt log.1 && kill -s 2 12356

Just plain kill -2 12356 may work for you here too. Signal 2 is SIGINT (it's also what Ctrl-C does, so you could try this in the foreground and move or remove the logfile from another terminal), which the trap should trap. To check;

> cat log.1
12356 Count is now 0
12356 Count is now 1
12356 Count is now 2
12356 Count is now 3
12356 Count is now 4
12356 Count is now 5
12356 Count is now 6
12356 Count is now 7
12356 Count is now 8
12356 Count is now 9
12356 Count is now 10
12356 Count is now 11
12356 Count is now 12
12356 Count is now 13
12356 Count is now 14

Now let's see if it is still writing to a log.txt even though we moved it:

> cat log.txt
12356 Count is now 15
12356 Count is now 16
12356 Count is now 17
12356 Count is now 18
12356 Count is now 19
12356 Count is now 20
12356 Count is now 21

Notice it kept going right where it left off. If you don't want to keep the record simply clear the log by deleting it

> rm -f log.txt && kill -s 2 12356

Check:

> cat log.txt
12356 Count is now 29
12356 Count is now 30
12356 Count is now 31
12356 Count is now 32
12356 Count is now 33
12356 Count is now 34
12356 Count is now 35
12356 Count is now 36

Still going.

You can't do this in a shell script for an executed subprocess, unfortunately, because if it is in the foreground, bash's own signal handlers (traps) are suspended, and if you fork it into the background, you can't reassign its output. I.e., this is something you have to implement in your application.

However...

If you can't modify the application (e.g., because you did not write it), I have a CLI utility you can use as an intermediary. You could also implement a simple version of this in a script which serves as a pipe to the log:

#!/bin/bash

trap sighandler INT

function sighandler () {
    touch log.txt
    exec 1> log.txt
}

echo "$0 $BASHPID"
exec 1> log.txt

count=0;
while read; do
    echo $REPLY
done  

Let's call this pipetrap.sh. Now we need a separate program to test with, mimicking the application you want to log:

#!/bin/bash

count=0
while [ $count -lt 60 ]; do
    echo "$BASHPID Count is now $count"
    sleep 2
    ((count++))
done           

That will be test.sh:

> (./test.sh | ./pipetrap.sh) &
./pipetrap.sh 15859

These are two separate processes with separate PIDs. To clear test.sh's output, which is being funnelled through pipetrap.sh:

> rm -f log.txt && kill -s 2 15859

Check:

>cat log.txt
15858 Count is now 6
15858 Count is now 7
15858 Count is now 8

15858, test.sh, is still running and its output is being logged. In this case, no modifications to the application are needed.


TL;DR

Open your log file in append mode:

cmd >> log

Then, you can safely truncate it with:

: > log

Details

With a Bourne-like shell, there are 3 main ways a file can be open for writing. In write-only (>), read+write (<>) or append (and write-only, >>) mode .

In the first two, the kernel remembers the current position you (by you, I mean, the open file description, shared by all the file descriptors that have duplicated or inherited it by forking from the one you opened the file on) are into the file.

When you do:

cmd > log

log is open in write-only mode by the shell for the stdout of cmd.

cmd (its initial process spawned by the shell and all the possible children) when writing to their stdout, write at the current cursor position held by the open file description they share on that file.

For instance, if cmd initially writes zzz, the position will be at byte offset 4 into the file, and the next time cmd or its children write to the file, that's where the data will be written regardless of whether the file has grown or shrunk in the interval.

If the file has shrunk, for instance if it has been truncated with a

: > log

and cmd writes xx, those xx will be written at offset 4, and the first 3 characters will be replaced by NUL characters.

$ exec 3> log # open file on fd 3.
$ printf zzz >&3
$ od -c log
0000000   z   z   z
0000003
$ printf aaaa >> log # other open file description -> different cursor
$ od -c log
0000000   z   z   z   a   a   a   a
0000007
$ printf bb >&3 # still write at the original position
$ od -c log
0000000   z   z   z   b   b   a   a
0000007
$ : > log
$ wc log
0 0 0 log
$ printf x >&3
$ od -c log
0000000  \0  \0  \0  \0  \0   x
0000006

That means you cannot truncate a file that has been open in write-only mode (and that's the same for read+write) as if you do, processes that had file descriptors open on the file, will leave NUL characters at the beginning of the file (those, except on OS/X, usually don't take space on disk though, they become sparse files).

Instead (and you'll notice most applications do that when they write to log files), you should open the file in append mode:

cmd >> log

or

: > log && cmd >> log

if you want to start on an empty file.

In append mode, all writes are made at the end of the file, regardless of where the last write was:

$ exec 4>> log
$ printf aa >&4
$ printf x >> log
$ printf bb >&4
$ od -c log
0000000   a   a   x   b   b
0000005
$ : > log
$ printf cc >&4
$ od -c log
0000000   c   c
0000002

That's also safer as if two processes have open (in that way) the file by mistake (as for instance if you've started two instances of the same daemon), their output will not overwrite each other.

On recent versions of Linux, you can check the current position and whether a file descriptor has been open in append mode by looking at /proc/<pid>/fdinfo/<fd>:

$ cat /proc/self/fdinfo/4
pos:        2
flags:      0102001

Or with:

$ lsof +f G -p "$$" -ad 4
COMMAND  PID USER   FD   TYPE  FILE-FLAG DEVICE SIZE/OFF     NODE NAME
zsh     4870 root    4w   REG 0x8401;0x0 252,18        2 59431479 /home/chazelas/log
~# lsof +f g -p "$$" -ad 4
COMMAND  PID USER   FD   TYPE FILE-FLAG DEVICE SIZE/OFF     NODE NAME
zsh     4870 root    4w   REG   W,AP,LG 252,18        2 59431479 /home/chazelas/log

Those flags correspond to the O..._ flags passed to the open system call.

$ gcc -E - <<< $'#include <fcntl.h>\nO_APPEND O_WRONLY' | tail -n1
02000 01

(O_APPEND is 0x400 or octal 02000)

So the shell's >> opens the file with O_WRONLY|O_APPEND (and 0100000 here is O_LARGEFILE which is not relevant to this question) while > is O_WRONLY only (and <> is O_RDWR only).

If you do a:

sudo lsof -nP +f g | grep ,AP

to search for files open with O_APPEND, you'll find most log files currently open for writing on your system.