How can I check the integrity of the downloaded files?

Integrity is defined only relatively to an authoritative source which tells what the "correct" sequence of byte is. Hash functions don't create integrity, they transport it. Basically, if you have:

  1. a file;
  2. a hash value, presumed correct;

then you can recompute the hash function over the file and see if you get the same hash value.

You still have to start somewhere. Some software distributors provide, along with the software, a "checksum" (or "md5sum" or "sha1sum") file, which contains the hash values. Assuming you got the correct checksum file, this allows you to verify whether you downloaded the right file, down to the last bit; and this works regardless of how you downloaded the possibly big file (even if it came over some shady peer-to-peer network or whatever; you cannot cheat hash functions).

Now this does not solve the integrity problem; it just reduces it to the problem of making sure that you got the right hash value. Hash values are small (32 bytes for SHA-256) so this opens a lot of possibilities. In the context of downloading files from P2P systems, you could obtain the hash value from a HTTPS Web site (HTTPS uses SSL which ensures server authentication -- you have the guarantee that you talk to the server you intend -- and transport integrity -- what you receive is guaranteed to be what the server sent). In the context of exchanging PGP public keys with people, hash values (often called "fingerprints" or "thumbprints") are short enough to be transferred manually (printed on a business card, spelled over phone...).

Digital signatures expand on the concept, but they too begin with hash functions. All digital signature algorithms sign not the message itself, but the hash of the message (which is equally good as long as the hash function is secure, i.e. resistant to collisions and preimages).


For many pieces of software, the providers of the files also provide the hash of the file that allows you to verify it's integrity.

For example, here is an example of the checksums the Fedora Project provides for the Fedora 19 x86_64 ISO.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

# The image checksum(s) are generated with sha256sum.
6e7e263e607cfcadc90ea2ef5668aa3945d9eca596485a7a1f8a9f2478cc7084 *Fedora-19-x86_64-DVD.iso
ef9eb28b6343e57de292f2b2147b8e74a2a04050655e0dc959febd69b0d5d030 *Fedora-19-x86_64-netinst.iso
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iQIcBAEBCAAGBQJRzjLNAAoJEAdHfmX7SxjmzQEP/jzXXe4rxRzA9NLrgWtRUp1b
nK+gpMgGXC5+zSWnKTQBUWMx0rx7uys/UQH934hz1rdMOqLkCe1XlVWp+0ya55nC
13OhOeeJhbdECzFvcSAkDh9Aj2Z9AnDeHbDvJXpEjvGiSLLsYWsjifIkMYDoNTRV
QlLWwOTlCCUZtEGEI1x0TWYlr0HUtkL5QAzQ4CSO7xGYE6YH/xwHje/8n7B25NHU
r2sSdlz3KORQyStqYK78cWlR70PT+3o00SO7ReHNVIZwCL8PjsOEm41Q4tjw3BF7
KLp+fcQTOgzLRY1VVk0n0POeJHbVB2TULjIW4F/vCiA3N6Uq595ebNxgSOBg8tRs
t7fkbktVB6+WeBCcGvJI7MWzYq0ukwRBAH+ZBLhpnEIsHOoFF6LRoiE0UncdhGb+
OmZqn8wZKzMf401E/vj7dEy+X3lAST+5mBm0EJQaFz2cbQCzuxfhnSc27w9Zq3ii
3Tgo1ubInXD/fu1WFH/Tu2aOmbNQwDr4YQDYOeuzokA3d/2bETIhEmYxmfGptfMw
fGG/u4QQMdXyPPKvdIkOTAp5d0tWnTucpkbHs1goygsCMz6XWvIZJt4bAbSRwXoa
qYXh8IpJAM0CrU0353RMDCNpDlpSXGeEy5riaFpFCe7SKZBzp2dJ1LsMJl1NJXxn
QavVCbllLFFjaTuYKrDZ
=oBpz
-----END PGP SIGNATURE-----

Instructions are also provided as well if one does not how to use the provided checksums.

Generally, if the files are provided through the same medium as the checksums, there is very little real benefit as an attacker that manages to compromise the download will also have the capability to replace the provided checksum. However, this is very useful in the case of the files being downloaded over an insecure connection like torrents or a CDN. In such a situation, the software provider can provided the small checksum on his server while serving the files through a higher bandwidth medium such as torrents or a CDN.

Tags:

Hash

Integrity