How does btrfs scrub work and what does it do?

I don't know if it does anything else, but I know that at minimum btrfs scrub does full-disk data scrubbing. Basically, it reads all data* on the disk, recomputes its checksum, and compares the recomputed checksum to the stored one. When the stored and recomputed checksums don't match, the system knows there's corruption.

Once corruption is detected, behavior depends on your disk setup. For example, if you have RAID 1 (mirroring), then btrfs scrub can fix corrupted data by copying an uncorrupted version from another disk. If all copies of some data are corrupted (e.g., multi-disk damage or not having redundant copies in the first place), then there's not much btrfs scrub can do besides warn you.

The reason this is important is that hard drives are only about 99.999999999999% reliable at reading and writing bits. So, every few terabytes of data I/O, there is likely to be an error. Although errors can be and are detected (and fixed, assuming a redundant copy is still valid) during normal disk access, routine full-disk scrubbing is able to find and fix errors before enough accumulate that all copies of the same data are corrupted.

* I'm using "data" instead of "file" to include metadata as well. Btrfs stores files and corresponding metadata (including checksums) in data blocks, all of which are checksummed and checked by btrfs scrub.

See also:

  • Btrfs -> Checksum tree and scrubbing at Wikipedia: Technical information about btrfs's data scrubbing.
  • Birthday problem -> Probability table at Wikipedia: Treating "hash space" as "number of data blocks" and "number of hashed elements" as "number of corrupted data blocks", this gives the probability of there being a data block with both copies corrupted in a RAID 1 setup.

Expanding on Mark Haferkamp's excellent answer, btrfs scrub reading all data instead of all files is a critical property and is actually what makes it so useful. Remember, btrfs has builtin RAID support. Say you have a btrfs filesystem spanning two drives that you've configured to use RAID1. In this case, when you write to a file, that write is replicated to both disks. (It gets more complicated with a more complex example but for this simple case, this is always what happens.) However when you read from that file, the read will hit only one disk (because it is a waste to read the file in twice unless the first copy is unusable for some reason).

Now say your second btrfs drive is degrading and starting to corrupt data in your filesystem. When you read blocks from this disk, btrfs will notice that the checksum does not match and will restore the block in-band from a known-good copy - the copy on the first drive. It'll return the data to the application calling read() (or whatever) as if nothing happened.

But what if btrfs doesn't decide to read from the second disk? Remember, there's two copies, so it can read from either the first or the second disk. If it reads from the first disk, it won't notice anything wrong. The only time it'll notice anything's wrong is when the first disk degrades, too. Now you're really hosed as it's too late to recover the data - the second disk's copy has been corrupted for a while, and the first copy (which is what you would've used to restore the second disk) is now corrupted too!

This is where btrfs scrub comes in. It reads all data, not all files. This includes metadata, but also secondary copies of files that wouldn't normally be in the read path. When it reads these secondary copies, that creates an opportunity for btrfs's in-band error correction to kick in and restore the data from a redundant copy.

Tags:

Btrfs