Linux BTRFS - convert to single with failed drive

Alright, I figured it out with the help of this Trello link. In case anyone else wants to do this, here's the procedure.

Procedure

From a RAID1 array of two disks, one /dev/sda which is faulty and another /dev/sdc known-good:

  1. Disable auto-mounting of this array in /etc/fstab, reboot. Basically, we want btrfs to forget this array exists, as there's a bug where it'll still try to use one of the drives if it's unplugged.
  2. Now that your array is unmounted, execute:

    echo 1 | sudo tee /sys/block/sda/device/delete

    replacing sda with the faulty device name. This causes the disk to spin down (you should verify this in dmesg) and become inaccessible to the kernel.

    Alternatively: just take the drive out of the computer before booting! I chose not to opt for this method, as the above works fine for me.

  3. Mount your array, with -o degraded mode.
  4. Begin a rebalancing operation with sudo btrfs balance start -f -mconvert=single -dconvert=single /mountpoint. This will reorganise the extents on the known-good drive, converting them to single (non-RAID). This will take almost a day to complete, depending on the speed of your drive and size of your array. (mine had ~700 GiB, and rebalanced at a rate of 1 1GiB chunk per minute) Luckily, this operation can be paused, and will keep the array online while it occurs.
  5. Once this is done, you can issue sudo btrfs device remove missing /mountpoint to remove the 'missing' faulty device.
  6. Begin a second rebalance with sudo btrfs balance start -mconvert=dup /mountpoint to restore metadata redundancy. This takes a few minutes on my system.
  7. You're done! Your array is now single mode, with all redundancy removed.
  8. Take your faulty drive outside, and beat it with a hammer.

Troubleshooting

  • Help, btrfs tried to write to my faulty disk, errored out, and forced it readonly!
    • Did you follow step 1, and reboot before continuing? It's likely that btrfs still thinks the drive you spun down is present. Rebooting will cause btrfs to forget any errors, and will let you continue.

Thanks for your post. I had this idea that I could test out raid, pop the drive out of my hotswap bay, use another drive and and then pop the raid drive back in. In retrospect, this was a bad idea and now I need my hotswap bay.

Here's what I found. As root:

# sudo btrfs fi show
Label: 'disk'  uuid: 12817aeb-d303-4815-8bba-a3440e36c62c
Total devices 2 FS bytes used 803.10GiB
    devid    1 size 931.51GiB used 805.03GiB path /dev/sda1
    devid    2 size 931.51GiB used 805.03GiB path /dev/sdb1

Note the devid listed for each drive. Man for brtrfs balance lead me to the devid option, took a couple tries to figure out how the filters worked (initially trying devid=/dev/sdb1). So your first attempt is going to look something like this.

# btrfs balance start -dconvert=single,devid=2 -mconvert=single,devid=2 /mnt

Which gave me an error.

ERROR: error during balancing '/media/.media': Invalid argument
There may be more info in syslog - try dmesg | tail    

Here's the error from dmesg:

BTRFS error (device sdb1): balance will reduce metadata integrity, use force if you want this

So this is the final that worked:

# btrfs balance start -f -dconvert=single,devid=2 -mconvert=single,devid=2 /mnt

Hopefully this helps someone else out.

Tags:

Linux

Raid

Btrfs