RAID-5: Two disks failed simultaneously?

Solution 1:

You have a double disk failure. This means your data is gone, and you will have to restore from a backup. This is why we aren't supposed to use raid 5 on large disks. You want to set up your raid so you always have the ability to withstand two disk failures, especially with large slow disks.

Solution 2:

Your options are:

  1. Restoring from backups.
    • You do have backups, don't you? RAID is not a backup.

  2. Professional data recovery
    • It's possible, though very expensive and not guaranteed, that a professional recovery service will be able to recover your data.

  3. Accepting your data loss and learning from the experience.
    • As noted in the comments, large SATA disks are not recommended for a RAID 5 configuration because of the chance of a double failure during rebuild causing the array to fail.
      • If it must be parity RAID, RAID 6 is better, and next time use a hot spare as well.
      • SAS disks are better for a variety of reasons, including more reliability, resilience, and lower rates of unrecoverable bit errors that can cause UREs (unrecoverable read errors)
    • As noted above, RAID is not a backup. If the data matters, make sure it's backed up, and that your backups are restore-tested.

Solution 3:

After you accepted a bad answer, I am really sorry for my heretic opinion (which saved such arrays multiple times already).

Your second failed disk has probably a minor problem, maybe a block failure. This is the cause, why the bad sync tool of your bad raid5 firmware crashed on it.

You could easily make a sector-level copy with a lowlevel disk cloning tool (for example, gddrescue is probably very useful), and use this disk as your new disk3. In this case, your array survived with a minor data corruption.

I am sorry, probably it is too late, because the essence of the orthodox answer in this case: "multiple failure in a raid5, here is the apocalypse!"

If you want very good, redundant raid, use software raid in linux. For example, its raid superblock data layout is public and documented... I am really sorry, for my this another heretic opinion.


Solution 4:

Simultaneous failure is possible, even probable, for the reasons others have given. The other possibility is that one of the disks had failed some time earlier, and you weren't actively checking it.

Make sure your monitoring would pick up a RAID volume running in degraded mode promptly. Maybe you didn't get an option but it's never good to have to learn these things from the BIOS.


Solution 5:

To answer "How could two hard drives fail simultaneously like that?" precisely, I'd like to quote from this article:

The crux of the argument is this. As disk drives have become larger and larger (approximately doubling in two years), the URE (unrecoverable read error) has not improved at the same rate. URE measures the frequency of occurrence of an Unrecoverable Read Error and is typically measured in errors per bits read. For example an URE rate of 1E-14 (10 ^ -14) implies that statistically, an unrecoverable read error would occur once in every 1E14 bits read (1E14 bits = 1.25E13 bytes or approximately 12TB).

...

The argument is that as disk capacities grow, and URE rate does not improve at the same rate, the possibility of a RAID5 rebuild failure increases over time. Statistically he shows that in 2009, disk capacities would have grown enough to make it meaningless to use RAID5 for any meaningful array.

So, RAID5 was unsafe in 2009. RAID6 will be soon too. As for RAID1, I started making them out of 3 disks. RAID10 with 4 disks is also precarious.