How to copy a file transactionally?

rsync does this job. A temporary file is O_EXCL created by default (only disabled if you use --inplace) and then renamed over the target file. Use --ignore-existing to not overwrite B if it exists.

In practice, I never experienced any problems with this on ext4, zfs or even NFS mounts.


Don't worry, noclobber is a standard feature.


You asked about NFS. This kind of code is likely to break under NFS, since the check for noclobber involves two separate NFS operations (check if file exists, create new file) and two processes from two separate NFS clients may get into a race condition where both of them succeed (both verify that B.part does not exist yet, then both proceed to successfully create it, as a result they're overwriting each other.)

There's not really to do a generic check for whether the filesystem you're writing to will support something like noclobber atomically or not. You could check the filesystem type, whether it's NFS, but that would be a heuristic and not necessarily a guarantee. Filesystems like SMB/CIFS (Samba) are likely to suffer from the same problems. Filesystems exposes through FUSE may or may not behave correctly, but that mostly depends on the implementation.


A possibly better approach is to avoid the collision in the B.part step, by using a unique filename (through cooperation with other agents) so that you don't need to depend on noclobber. For instance, you could include, as part of the filename, your hostname, PID and a timestamp (+possibly a random number.) Since there should be a single process running under a specific PID at a host at any given time, this should guarantee uniqueness.

So either one of:

test -f B && continue  # skip already existing
unique=$(hostname).$$.$(date +%s).$RANDOM
cp A B.part."$unique"
# Maybe check for existance of B again, remove
# the temporary file and bail out in that case.
mv B.part."$unique" B
# mv (rename) should always succeed, overwrite a
# previously copied B if one exists.

Or:

test -f B && continue  # skip already existing
unique=$(hostname).$$.$(date +%s).$RANDOM
cp A B.part."$unique"
if ln B.part."$unique" B ; then
    echo "Success creating B"
else
    echo "Failed creating B, already existed"
fi
# Both cases require cleanup.
rm B.part."$unique"

So if you have a race condition between two agents, they will both proceed with the operation, but the last operation will be atomic, so either B exists with a full copy of A, or B doesn't exist.

You can reduce the size of the race by checking again after the copy and before the mv or ln operation, but there's still a small race condition there. But, regardless of the race condition, the contents of B should be consistent, assuming both processes are trying to create it from A (or a copy from a valid file as origin.)

Note that in the first situation with mv, when a race exists, the last process is the one who wins, since rename(2) will atomically replace an existing file:

If newpath already exists, it will be atomically replaced, so that there is no point at which another process attempting to access newpath will find it missing. [...]

If newpath exists but the operation fails for some reason, rename() guarantees to leave an instance of newpath in place.

So, it's quite possible processes consuming B at the time might see different versions of it (different inodes) during this process. If the writers are just all trying to copy the same contents, and readers are simply consuming the contents of the file, that might be fine, if they get different inodes for files with the same contents, they'll be happy just the same.

The second approach using a hard link looks better, but I recall making experiments with hardlinks in a tight loop on NFS from many concurrent clients and counting success and there still seemed to be some race conditions there, where it seemed if two clients issued a hardlink operation at the same time, with the same destination, both seemed to succeed. (It is possible that this behavior was related to the particular NFS server implementation, YMMV.) In any case, that's probably the same kind of race condition, where you might end up getting two separate inodes for the same file in cases where there's heavy concurrency between writers to trigger these race conditions. If your writers are consistent (both copying A to B), and your readers are only consuming the contents, that might be enough.

Finally, you mentioned locking. Unfortunately locking is severely lacking, at least in NFSv3 (not sure about NFSv4, but I'd bet it's not good either.) If you're considering locking, you should look into different protocols for distributed locking, possibly out of band with the actual file copies, but that's both disruptive, complex and prone to issues such as deadlocks, so I'd say it's better to be avoided.


For more background on the subject of atomicity on NFS, you might want to read on the Maildir mailbox format, which was created to avoid locks and work reliably even on NFS. It does so by keeping unique filenames everywhere (so you don't even get a final B at the end.)

Perhaps somewhat more interesting to your particular case, the Maildir++ format extends Maildir to add support for mailbox quota and does so by atomically updating a file with a fixed name inside the mailbox (so that might be closer to your B.) I think Maildir++ tries to append, which is not really safe on NFS, but there's a recalculation approach which uses a procedure similar to this and it's valid as an atomic replace.

Hopefully all these pointers will be useful!

Tags:

Linux

Bash

Cp