How to safely replace a not-yet-failed disk in a Linux RAID5 array?

Using mdadm 3.3

Since mdadm 3.3 (released 2013, Sep 3), if you have a 3.2+ kernel, you can proceed as follows:

# mdadm /dev/md0 --add /dev/sdc1
# mdadm /dev/md0 --replace /dev/sdd1 --with /dev/sdc1

sdd1 is the device you want to replace, sdc1 is the preferred device to do so and must be declared as a spare on your array.

The --with option is optional, if not specified, any available spare will be used.

Older mdadm version

Note: You still need a 3.2+ kernel.

First, add a new drive as a spare (replace md0 and sdc1 with your RAID and disk device, respectively):

# mdadm /dev/md0 --add /dev/sdc1

Then, initiate a copy-replace operation like this (sdd1 being the failing device):

# echo want_replacement > /sys/block/md0/md/dev-sdd1/state 

Result

The system will copy all readable blocks from sdd1 to sdc1. If it comes to an unreadable block, it will reconstruct it from parity. Once the operation is complete, the former spare (here: sdc1) will become active, and the failing drive will be marked as failed (F) so you can remove it.

Note: credit goes to frostschutz and Ansgar Esztermann who found the original solution (see the duplicate question).

Older kernels

Other answers suggest:

  • Johnny's approach: convert array to RAID6, "replace" the disk, then back to RAID5,
  • Hauke Laging's approach: briefly remove the disk from the RAID5 array, make it part of a RAID1 (mirror) with the new disk and add that mirror drive back to the RAID5 array (theoretical)...

If you don't mind running RAID-6 (2 parity disks rather than 1), and if you're running mdadmin 3.1.x or higher, you could convert your RAID-5 array to RAID-6 to add an additional parity disk. This will will place the array under stress during the rebuild, however. And it has some performance implications since there are more parity disks to update during writes.

But if it completes successfully, then you can keep your failing disk in place and when it ultimately fails, you've still got parity protection for the array. I think you can conver the array from RAID6 back to RAID5 if you don't wait to keep it as RAID6.

I don't know of an online way to keep the array as RAID-5 and replace the disk without putting the array in degraded mode, as I think you have to mark it as failed to replace it. Your dd copy idea might be the way to do that.


This may be possible meeting the requirements

  1. online
  2. don't stress any disk except for the one which is to be replaced

But even if the following may work you will probably not find any recommendation of that kind "in the books"...

Idea:

  1. Take disk OLD out of the array (for a short moment): mdadm --manage /dev/raid5 --fail /dev/OLD
  2. Create a new md device (RAID-1) from disks OLD and NEW: mdadm --build /dev/md42 --level=mirror --raid-devices=2 /dev/OLD /dev/NEW
  3. Put the RAID-1 back in the array (instead of /dev/OLD): mdadm --manage /dev/raid5 --re-add /dev/md42

What should :-) happen:

  1. The RAID-5 gets /dev/md42 in sync. This should not take long.
  2. The RAID-5 is normally operational again (but slower).
  3. /dev/NEW is synced with /dev/OLD.

Watch the sync progress (cat /proc/mdstat or mdadm --monitor). If the sync is finished take the RAID-1 out of the RAID-5, stop the RAID-1, re-add /dev/NEW to the RAID-5. If everything is fine, overwrite the mdraid superblocks on /dev/OLD in order to avoid problems: mdadm --zero-superblock

Warning: The fast RAID-5 sync may work only if you use a bitmap. If you don't have one then better make a test with a dummy RAID-5 (without a bitmap) first. Or add one. At least adding an external one should be possible. Otherwise it may be necessary to stop the RAID-5 before changing the devices. If you boot from the RAID-5 this would become a bit complicated, though.