rsync directory so all changes appear atomically

You can use the --link-dest= option. Basically you would create a new folder, all files are hard-linked to the new one. When everything is done, you can just swap the folder names and remove the old one.

It is impossible to do this 100% atomic in Linux since there is no kernel/VFS support for it. However, swapping the names is actually only 2 syscalls away so it should take way less than 1 second to complete it. It is possible only on Darwin (MAC/OSX) with the exchangedata system call on HFS filesystems.


I do something similar with rsync backups [to disk] and I've encountered the same problem due to a daemon updating files while the backup is running.

Unlike many programs, rsync has many different error codes [See the man page bottom]. Of interest are two:

23 -- partial transfer due to error
24 -- partial transfer due to vanished source files

When rsync is doing a transfer and encounters one of these situations, it doesn't just stop immediately. It skips over and continues with the files it can transfer. At the end, it presents the return code.

So, if you get error 23/24, just rerun the rsync. The subsequent runs will go much faster, usually just transferring the missing files from the previous run. Eventually, you'll get [or should get] a clean run.

As to being atomic, I use a "tmp" dir during transfer. Then, when rsync run is clean, I rename it [atomically] to <date>

I also use the --link-dest option, but I use that to keep delta backups (e.g. --link-dest=yesterday for daily)

Although I've not used it myself, the --partial-dir=DIR may keep the hidden files from cluttering up the backup directory. Be sure that DIR is on the same filesystem as your backup directory so renames will be atomic

While I do this in perl, I written a script that summarizes what I've been saying with a bit more detail/precision for your particular situation. It's in tcsh-like syntax, [untested and a bit rough], but treat it as pseudo-code to write your own bash, perl, python script as you choose. Note that it has no limit on retries, but you can add that easily enough, according to your wishes.

#!/bin/tcsh -f
# repo_backup -- backup repos even if they change
#
# use_tmp -- use temporary destination directory
# use_partial -- use partial directory
# use_delta -- make delta backup

# set remote server name ...
set remote_server="..."

# directory on server for backups
set backup_top="/path_to_backup_top"
set backup_backups="$backup_top/backups"

# set your rsync options ...
set rsync_opts=(...)

# keep partial files from cluttering backup
set server_partial=${remote_server}:$backup_top/partial
if ($use_partial) then
    set rsync_opts=($rsync_opts --partial-dir=$server_partial)
endif

# do delta backups
if ($use_delta) then
    set latest=(`ssh ${remote_server} ls $backup_backups | tail -1`)

    # get latest
    set delta_dir="$backup_backups/$latest"

    if ($#latest > 0) then
        set rsync_opts=($rsync_opts --link-dest=${remote_server}:$delta_dir)
    endif
endif

while (1)
    # get list of everything to backup
    # set this to whatever you need
    cd /local_top_directory
    set transfer_list=(.)

    # use whatever format you'd like
    set date=`date +%Y%m%d_%H%M%S`

    set server_tmp=${remote_server}:$backup_top/tmp
    set server_final=${remote_server}:$backup_backups/$date

    if ($use_tmp) then
        set server_transfer=$server_tmp
    else
        set server_transfer=$server_final
    endif

    # do the transfer
    rsync $rsync_opts $transfer_list $server_transfer
    set code=$status

    # run was clean
    if ($code == 0) then
        # atomically install backup
        if ($use_tmp) then
            ssh ${remote_server} mv $backup_top/tmp $backup_backups/$date
        endif
        break
    endif

    # partial -- some error
    if ($code == 23) then
        continue
    endif

    # partial -- some files disappeared
    if ($code == 24) then
        continue
    endif

    echo "fatal error ..."
    exit(1)
end

Tags:

Linux

Rsync

Files