What's an effective offsite backup strategy for a ZFS mirrored pool?

After much tinkering and experimentation I've found a solution, albeit with a fairly large tradeoff.

First off, the options I had to rule out:

  • Having a second offsite ZFS server with a mirrored pool wasn't an option due to cost. Had it been an option this would by far have been the best approach, utilizing ZFS send / receive to ship snapshots to the remote pool.

  • Having a second onsite ZFS mirrored pool, which I could remove disks from to take home. This is more feasible than the first option, but I would need the second pool to always have two disks onsite (or to use two data-copies on a single onsite disk). At present I have four disks, and no more space for a fifth in the server. This would be a fair approach but still not ideal.

  • Using ZFS attach and detach to rotate the backup disk into and out of the mirrored pool. This works well, but has to perform a full resilver every time the disk is added. This takes unacceptably long, and so I couldn't rely on this.

My solution is similar to using attach and detach, however it uses online and offline. This has the advantage of performing a delta resilvering versus a full resilvering, but the drawback that the pool always reports a DEGRADED state (the pool always has two disks; the rotating offsite disks are marked offline when they are in remote storage and resilver and then come online when they are onsite).

So, a quick recap and overview of my setup:

I have one ZFS server and four identical disks. ZFS is setup to use a mirrored pool. Two of the four disks are permanent members of this pool. The other two disks rotate; one is always in offsite storage, the other is part of the pool to act as a ready-to-go backup.

When it comes time to rotate the backups:

  • I wait for a zfs scrub to complete to reasonably assure the backup disk is error free

  • I zfs offline the disk which will be taken remote. After its offline'd I hdparm -Y /dev/id to spin it down. After a minute I partially remove the disk sled (just enough to ensure its lost power) and then give it another minute before fully pulling the drive to guarantee it has stopped spinning. The disk goes in a static bag and then a protective case and goes offsite.

  • I bring in the other offsite disk. It gets installed in the hotswap tray and spins up. I use zfs online to restore the disk to the pool and kick off a partial resilvering to make it concurrent.

This system guarantees that at any given time I have two ONLINE mirror disks and one OFFLINE remote disk (which has been scrubbed). The fourth disk is either being resilvered or online, which has the benefit that in case a running drive fails it's probably the pool will still consistent of two online disks.

It's worked well for the past couple weeks, but I'd still consider this a hackish approach. I'll follow up if I run into any major issues.


Update: After running with this for a couple months I've found that in my real-world use the resilvering is taking the same time for either detach/attach and offline/online. In my testing I don't think I was running a scrub--my hunch is that if a drive is offline for a scrub then it requires a full resilver.


Why not zfs send your snapshots to a remote ZFS machine? I use a simple bash script for this:

#!/usr/local/bin/bash
# ZFS Snapshot BASH script by Shawn Westerhoff
# Updated 1/14/2014

### DATE VARIABLES
# D = Today's date
# D1 = Yesterday's date
# D# = Today less # days date
Y=$(date -v-1d '+%m-%d-%Y')
D=$(date +%m-%d-%Y)
D1=$(date -v-1d '+%m-%d-%Y')
D10=$(date -v-10d '+%m-%d-%Y')
D20=$(date -v-20d '+%m-%d-%Y')

# Step 1: Make the snapshots

for i in $( zfs list -H -o name ); do
    if [ $i == tier1 ]
    then echo "$i found, skipping"
    else
    zfs snapshot $i@$D
    fi
done

# Step 2: Send the snapshots to backup ZFS sever

    for i in $( zfs list -H -o name ); do
        zfs send -i $i@$D1 $i@$D | ssh -c arcfour [email protected] zfs recv $i
    done

# Step 3: Destroy snapshots that are 20 days old

for i in $( zfs list -H -o name ); do
        if [ $i == tier1 ]
        then echo "$i found, skipping"
        else
        zfs destroy $i@$D20
        fi
done

Tags:

Backup

Zfs