How to check integrity of a dd backup?

If the command terminated successfully, then the backup is correct, barring a hardware fault (which could equally affect any verification you might perform). It may later become incorrect if the hardware is faulty, but most storage hardware detects corruption.

There is one caveat here: in a pipeline, the shell doesn't report errors from the left-hand side. (This is because of a fairly common scenario where the right-hand side doesn't need to read all the data, e.g. some_command | head, and the left-hand side dies because its output is no longer wanted.) So here a read error from dd would be ignored. In bash, set the pipefail option to report errors from all parts of the pipeline.

Also, beware that dd bs=… ignores some errors and dd is often slower than alternatives. I recommend not using dd at all: it has no benefits to just copy a whole file. Contrary to what you might have read somewhere, dd is not a low-level disk access command with special property, there is absolutely no magic in dd, the magic is in /dev/hda.

shopt -s pipefail
set -e
</dev/hda buffer -s 64k -S 10m | ssh myuser@myhost "cat > ~/image.img"

Nonetheless, if you wish to check the backup, the best way is to take a cryptographic checksum on each side and compare them. For example:

ssh myuser@myhost "sha1sum image.img" &
sudo sha1sum /dev/hda

Check that the two checksums are identical.

Note that this tests whether the backup and the original are identical at the time of the check. Anything you change on /dev/hda, including mounting and unmounting a filesystem even without making any change (which will update a last mount date on many filesystems), will change the checksum. If you want to verify the integrity later, note down the checksum of the disk at the time of the backup somewhere.


As darnir & Giles mentioned, the best thing to do is run cryptographic hashes immediately after the back up before anything has been altered on your source disk. If, however, you've used the disk since then the hashes will most likely not match. Even changing one byte on the disk will result in a completely different hash.

Although it's far less than ideal you can spot check the image by mounting it. On the system where the disk image is, run the following (create /mnt/disk if it doesn't exist or us an alternate location):

mount -o loop image.img /mnt/disk

You can then browse around in /mnt/disk and see all of the files. Check the sha1 hashes of critical files inside the image against the originals to verify their integrity.