Why is it not recommended to use RAID 5 for a log file?

Transaction log writes, when they occur, are synchronous operations, that is, the activity that has caused a log write must wait until log I/O completes before continuing to do whatever it is doing. As a result log writes are very sensitive to the write throughput of the underlying storage.

As you have mentioned, every write to a RAID-5 device has an overhead1 of calculating and writing a parity block in addition to the data block(s). However small, this extra work RAID-5 performs on each write operation is the reason behind the recommendation to not use RAID-5 for log storage.


1 - More details in this Q&A


RAID-5 maintains redundancy by using N-1 disks for data and 1 disk for the XOR of that data. (It's not actually the same disk used for all the parity; that's RAID-4. RAID-5 distributes the parity across all the disks, changing at each "stripe" boundary.) https://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_5

The biggest RAID-5 write overhead (read + rewrite of the parity disk for that block) only applies for short writes. A full-stripe write (e.g. as part of a large sequential I/O without sync/flush after small steps) can just calculate the parity stripe from the data stripes and write them all in parallel, without having to read anything from disk.

As mustaccio points out, transaction log writes have to hit disk before we can allow later writes to hit disk. (Or at least the battery-backed memory of a RAID controller. i.e. become persistent.) This typically means that they can't be buffered into a big contiguous full-stripe write.

In the optimal case, N-disk RAID-5 sequential write bandwidth in theory equals per-disk bandwidth times N-1. (Plus some CPU time, or not even that if the XOR parity computation is offloaded to a hardware RAID controller.)

In the pessimal case, yes, RAID-5 has to do extra disk I/O to read the old data and parity and update it by XORing the old data into the parity (to remove it), and then XORing in the new data.


Notice that it's not just calculating the parity that adds the big overhead. It's that the data you need to calculate new parity might be sitting on disk, not in memory, for small writes.

RAID-5 is (very) bad at small writes, very good with large writes (almost as good a RAID-0), and good for reads in general.


Historically some RAID controllers would read the full length of a stripe to update parity, but at least Linux software RAID only reads the sectors that correspond to the actual small write. This helps some, but small-ish stripe size like 32k or 64k (I think) is usually a good thing (so full-stripe writes are more common without having to buffer megabytes of data).

Still, that just goes from "very very bad" to "very bad" compared to RAID10 or RAID1 where small writes can just happen on both disks that hold the blocks being written.