What does "signing" a file really mean?

Signing a file does not encrypt it. When Alice signs a file she usually signs the whole file. So she calculates a hash of the whole file and signs only the hash with her private key and attaches this piece of information to the file.
Bob uses her public key to verify it and gets her calculated hash. He then calculates the hash of the file himself (without the signature of course) and checks both hashes. If they match its the same exact version of the file Alice sent. If they don't match Mallory could have changed it.

The file itself never gets encrypted, and of course you can just remove the signature, but then it's not signed anymore (and therefore worthless).

For more technical and detailled information please refer to forests answer: https://security.stackexchange.com/a/198473/191453


Unfortunately, the answers here which claim that signing is equivalent to encryption of the message digest are not entirely correct. Signing does not involve encrypting a digest of the message. While it is correct that a cryptographic operation is applied on a digest of the message created by a cryptographic hash algorithm and not the message itself, the act of signing is distinct from encryption.

Taken from https://www.cs.cornell.edu/courses/cs5430/2015sp/notes/rsa_sign_vs_dec.php:

In the abstract world of textbooks, RSA signing and RSA decryption do turn out to be the same thing. In the real world of implementations, they are not. So don't ever use a real-world implementation of RSA decryption to compute RSA signatures. In the best case, your implementation will break in a way that you notice. In the worst case, you will introduce a vulnerability that an attacker could exploit.

Furthermore, don't make the mistake of generalizing from RSA to conclude that any encryption scheme can be adapted as a digital signature algorithm. That kind of adaptation works for RSA and El Gamal, but not in general.


Creating a digital signature for a message involves running the message through a hash function, creating a digest (a fixed-size representation) for the message. A mathematical operation is done on the digest using a secret value (a component of the private key) and a public value (a component of the public key). The result of this operation is the signature, and it is usually either attached to the message or otherwise delivered alongside it. Anyone can tell, just by having the signature and public key, if the message was signed by someone in possession of the private key. So, how does this work?

I'll use RSA as an example algorithm. First, a little background on how RSA works. RSA encryption involves taking the message, represented as an integer, and raising it to the power of a known value (this value is most often 3 or 65537). This value is then divided by a public value that is unique to each public key. The remainder is the encrypted message. This is called a modulo operation. Signing with RSA is a little different. The message is first hashed, and the hash digest is raised to the power of a secret number, and finally divided by the same unique, public value in the public key. The remainder is the signature. This differs from encryption because, rather than raising a number to the power of a known, public value, it's raised to the power of a secret value that only the signer knows.

Although RSA signature generation is similar to RSA decryption on paper, there is a big difference to how it works in the real world. In the real world, a feature called padding is used, and this padding is absolutely vital to the algorithm's security. The way padding is used for encryption or decryption is different from the way it is used for a signature. The details which follow are more technical...


To use textbook RSA as an example of asymmetric cryptography, encrypting a message m into ciphertext c is done by calculating c ≡ me (mod N), where e is a public value (usually a Fermat prime), and N is the non-secret product of two secret prime numbers. Signing a hash m, on the other hand, involves calculating s ≡ md (mod N), where d is the modular inverse of e, being a secret value derived from the secret prime numbers. This is much closer to decryption than it is to encryption, though calling signing decryption is still not quite right. Note that other asymmetric algorithms may use completely different techniques. RSA is merely a common enough algorithm to use as an example.

The security of signing comes from the fact that d is difficult to obtain without knowing the secret prime numbers. In fact, the only known way to obtain d from N is to factor N into its component primes, p and q, and calculate d ≡ e-1 mod (p - 1)(q - 1). Factoring very large integers is believed to be an intractable problem for classical computers. This makes it possible to easily verify a signature, as that involves determining if se ≡ m (mod N). Creating a signature, however, requires knowledge of the private key.


Of course one can choose to sign any (part of) information one wants, and leave other parts unsigned. But usually, when we say "sign a file", we refer to signing the whole file plus the file meta-data (e.g. file modification timestamp). This is how OpenPGP and GPG work.

But, if it is not a file, say it is XML signing, you must specify which parts of the XML content are actually covered by the signature.

Also, try to differentiate signatures from encryption. These are two independent matters. One file can be unencrypted+signed, or encrypted+unsigned, or any other combination.