Is sha1sum still secure for downloadable software packages signature?

I suppose you "use sha1sum" in the following context: you distribute some software packages, and you want users to be able to check that what they downloaded is the correct package, down to the last bit. This assumes that you have a way to convey the hash value (computed with SHA-1) in an "unalterable" way (e.g. as part of a Web page which is served over HTTPS).

I also suppose that we are talking about attacks here, i.e. some malicious individual who can somehow alter the package as it is downloaded, and will want to inject some modification that will go undetected.

The security property that the used hash function should offer here is resistance to second-preimages. Most importantly, this is not the same as resistance to collisions. A collision is when the attacker can craft two distinct messages m and m' that hash to the same value; a second-preimage is when the attacker is given a fixed m and challenged with finding a distinct m' that hashes to the same value.

Second-preimages are a lot harder to obtain than collisions. For a "perfect" hash function with output size n bits, the computational effort for finding a collision is about 2n/2 invocations of the hash function; for a second-preimage, this is 2n. Moreover, structural weaknesses that allow for a faster collision attack do not necessarily apply to a second-preimage attack. This is true, in particular, for the known weaknesses of SHA-1: right now (September 2015), there are some known theoretical weaknesses of SHA-1 that should allow the computation of a collision in less than the ideal 280 effort (this is still a huge effort, about 261, so it has not been actually demonstrated yet); but these weaknesses are differential paths that intrinsically require the attacker to craft both m and m', therefore they do not carry over second-preimages.

For the time being, there is no known second-preimage attack on SHA-1 that would be even theoretically or academically faster than the generic attack, with a 2160 cost that is way beyond technological feasibility, by a long shot.

Bottom-line: within the context of what you are trying to do, SHA-1 is safe, and likely to remain safe for some time (even MD5 would still be appropriate).

Another reason for using sha1sum is the availability of client-side tools: in particular, the command-line hashing tool provided by Microsoft for Windows (called FCIV) knows MD5 and SHA-1, but not SHA-256 (at least so says the documentation)(*).

Windows 7 and later also contain a command-line tool called "certutil" that can compute SHA-256 hashes with the "-hashfile" sub-command. This is not widely known, but it can be convenient at times.


That being said, a powerful reason against using SHA-1 is that of image: it is currently highly trendy to boo and mock any use of SHA-1; the crowds clamour for its removal, anathema, arrest and public execution. By using SHA-1 you are telling the world that you are, definitely, not a hipster. From a business point of view, it rarely makes any good not to yield to the fashion du jour, so you should use one of the SHA-2 functions, e.g. SHA-256 or SHA-512.

There is no strong reason to prefer SHA-256 over SHA-512 or the other way round; some small, 32-bit only architectures are more comfortable with SHA-256, but this rarely matters in practice (even a 32-bit implementation of SHA-512 will still be able to hash several dozens of megabytes of data per second on an anemic laptop, and even in 32-bit mode, a not-too-old x86 CPU has some abilities at 64-bit computations with SSE2, which give a good boost for SHA-512). Any marketing expert would tell you to use SHA-512 on the sole basis that 512 is greater than 256, so "it must be better" in some (magical) way.


You should use SHA-256 or SHA-512.

If you are only signing packages you have created yourself, then technically SHA-1 is still secure for that purpose. The property that is now weakened is "collision resistance" which you are not strictly relying on. However, the security of SHA-1 is only going to get worse with time, so it makes sense to move on now.


Tom Leak has a beautiful answer (which is why it is accepted). It is concerned with the mathematically provable facts behind the use of SHA-1. There is a second approach which is less fact based but may provide valuable heuristic information. I learned this approach from reading Bruce Schneier's opinions, but I cannot find the links off hand, so Bruce will have to deal with my namedropping.

In theory an algorithm is not broken until it's broken. Until someone has found a way to do X, where X is something that should be computationally infeasible, it is not considered "broken."1 However, that proves to be of limited value in its practical application in cryptography. Cryptographers would really rather get a little notice before their products fall apart, not after.

What has been found, historically, is that algorithms are rarely broken in one big step. Yes, it can happen, and a cryptographer has to plan for that, but what has been found empirically is that they are typically whittled away over time, paper after paper. It has been found that watching the difficulty of generating a collision is a reasonably good metric for guesstimating when the algorithm will actually be broken. So when Tom points out that the collision should take 280 operations and now takes 261, it is very valid to point out, as he did, that the 261 operations is theoretical, because it is still too large to warrant an attempt. However, it is also valid to think of it as "the algorithm has experienced a reduction in strength of 19 bits of power," and use that as a poor man's rule of thumb to project forward and estimate when that will become an issue.

This kind of thinking is why there is now a SHA-3, even though theoretically SHA-1 is still not fully broken. The cryptographers involved in SHA-3's development and testing know that it is going to take quite a while to develop confidence in SHA-3, and they want to make sure that confidence is there before SHA-1 breaks, not after.

1. I am aware that the most technically strict definition of a "broken" hash merely one where an attacker can do better than brute force, as opposed to when it actually becomes computationally feasible. However, this latter definition is more typically used when discussing the practical side of hashing.