MD RAID sector repair

SHORT ANSWER: mirroring and parity-based RAID layouts support repairing a bad sector with supposedly good data, both during normal reads and during scrubs. However classical RAID (both hardware and software based) can do nothing against silent data corruption, which requires stronger protection in the form of data checksum (provided, for example, by BTRFS and ZFS).

LONG ANSWER: the question and the provided answers conflate different concept about how disks, MDRAID and checksummed filesystems works. Let explain them one by one; anyway, please consider that the exact behaviors are somewhat firmware and implementation dependent:

  • the first line of defense is the disk's own internal ECC: when some bit goes bad the embedded ECC recovery kick in, correcting the affected error in realtime. A low ECC read rate will generally not cause an automatic sector repair/reallocation; however, if ECC errors accumulate and grow up, the disk's firmware will finally reallocate the affected sector before it become unreadable (this will be counted as "Reallocated even count" by SMART attribute). Some enterprise disks periodically read all sectors to timely discover problematic sectors (see SAS/SATA surface scanning).

  • if the sector is only very rarely read and the disk does not "see" the gradual sector data corruption, a read can suddenly fail ("Pending Sectors" SMART attribute) and the affected data are lost. The disk will report a SATA READ ERROR to the operating system and moves on. If using a RAID 1/5/6 scheme the system has sufficient redundancy to reconstruct the missing data, overwriting the failing sectors and, depending on the disk firmware, forcing a sector reallocation. Traditionally, both hardware RAID cards and MDRAID (Linux software RAID) worked in this manner, relying on HDD own remapping feature. Newer HW RAID cards and MDADM relases further provide an internal remapping lists which kicks in if/when the HDD fails to remap the affected sector (ie: because no spare sectors are available); you can read more in md man page, especially the "RECOVERY" section. This obviously means the disk should be immediately replaced. To avoid discovering too many unreadable sectors too late, all RAID implementations support a "scrub" o "patrol read" operation, where the entire array is periodically read to test the underlying disks.

  • the protection scheme described above only works when the read/write error is clearly reported to the RAID card and/or operating system. In the case of silent data corruption (ie: a disks returning bad data instead of a clear error), such approach is useless. To protect yourself from silent data corruption (which, by definition, are not reported by any SMART attribute), you need an additional checksum to validate the correctness of returned data. This additional protection can be hardware-based (ie: SAS T10 extension), block-device software-based (ie: dm-integrity) or a fully integrated checksum filesystem (BTRFS and ZFS). Speaking about ZFS and BTRFS, they support a "scrub" operation similar, but not identical (ie: scanning only actual allocated space/data), to their RAID conterparts.

NOTE: RAID6 or 3-way RAID1 layouts can theoretically offer some added protection against bitrot compared to RAID5 and 2-way RAID1 by using some form of "majority vote". However, as it would command a massive performance hit, I never saw such behavior in common implementation. See here for more details.