Does Git prevent data degradation

Git's hashing only happens at the time commits are created, and from there on the hashes are used to identify the commits. This in no way ensures the integrity of the files. Git repos can get corrupted and lose data. In fact, git has a built-in command to detect this kind of loss, git fsck, but as the documentation says, you are responsible for restoring any corrupted data from backups.


Depends on what you mean by "prevent".

(First of all, bit-rot is a term with multiple definitions. This question is not about code becoming unrunnable due to lack of maintenance.)

If you mean by "prevent" that it will likely detect corruption by decay of bits, yes, that will work. It will however not help to fix that corruption: the hashes only provide error detection, not correction.

This is generally what is meant by "integrity": The possibility to detect unauthorized/unintended manipulation of data, not the possibility to prevent or correct it.

You would generally still want a RAID1 together with backups (possibly implemented with ZFS snapshots or similar, I am not familiar with the ZFS semantics on RAID1 + snapshots), for several reasons:

  • if a disk fails fatally, you either need a RAID1 (or a recent backup) to restore your data; no error correction can correct for a whole disk failing, unless it has a full copy of the data (RAID1). For a short downtime, you essentially must have RAID1.

  • if you accidentally delete parts or whole of the repository, you need a backup (RAID1 doesn’t protect you since it immediately reflects the change to all devices)

Block-level RAID1 (e.g. via LVM or similar) with only two disks in itself will not protect you against silent decay of data though: the RAID controller cannot know which of the two disks holds the correct data. You need additional information for that, like a checksum over files. This is where the ZSF and btrfs checksums come in: they can be used (which is not to say that they are used in these cases, I don’t know how ZFS or btrfs handle things there) to distinguish which of the two disks holds the correct data.