How to know whether a textfile has been edited or tampered with?

The default solution would be to use cryptographic signatures. Have every technician generate a PGP keypair, publishing the public key and keeping the private key secure.

When a technician made an analysis, they sign the result file with their private key. Now anyone who wants to verify the file can check the signature using the public key of the technician. When anyone changes the file, the signature won't be correct anymore.

Security consideration: Should any private key of a technician get known to someone else, that person can change the files and also change the signature to one which will be valid. This problem can be mitigated by having multiple persons sign each result file. An attacker would require all keys to replace all signatures with valid ones.

Alternative low-tech solution: Print out each result file, have the technician sign it the old-school way (with a pen) and deposit the file in a physically secure archive.

By the way: Do not assume that the vendor-specific binary format provides any more security against tampering than XML does. Just because you can't read and edit it when you open it with a text editor doesn't mean nobody else can reverse-engineer the format and build an editor for it.


Any form of digital signature will do. Here are a few pointers:

  • For XML data, there is a digital signature standard (XMLSign). Unfortunately, this standard is rather poor and has an important security loophole (documents needs to be normalized through an XML transform before they can be signed. This is extremely hard to do securely since the transform itself becomes an important part of the signature).

  • You can also use PGP or S/MIME to digitally sign documents, These will produce new, text-based and mostly readable but still tampered-proof documents.

  • Finally, you can use detached signatures. Basically, it's another file that contains the digital signature linked to another document and can be used to validated the original data (no matter what the original format).

Let me add a few extra info here:

  • Picking the right properties for the signature (algorythm, key type and size, etc.) is very dependent on the condition you set: how long do you intend to have the data secure, against what type of adversary do you intend to protect them (what's the value of a forgery? what would be the value of an attack that would break all documents signed with the same key ?), is there any regulatory requirement? This means that you should consult a specialist who can translate these business requirement and tranlate them into technical ones.
  • I strongly advise you to add a secure timestamp to your signature. This will not only allow you to prove that a document hasn't been tampered with but also allow you to prove when the signature occurred.

I will outline the three main options and pros/cons of each.

Store backups of the files in a secure location

Pretty self-explanatory. The "secure location" can be a read-only medium (like CDs), or a network drive that everyone can read but only the supervisor can write to, or an online storage service (e.g. Dropbox) that makes it reasonably hard to forge file modification dates.

Pros

  • You should have a backup system anyway

Cons

  • If files are large, downloading them for verification can be time-consuming
  • If the forger breaks into the secure location, he can cover his tracks

 

Store hashes in a secure location

A hash is a fingerprint of a file that looks something like 8f2e3f53aa90b27bda31dea3c6fc72f6; if two files are just slightly different they will have a different hash. Take a hash of the original file and store it securely, then to verify a file has not been modified, take a hash of it and compare it to the stored hash.

Pros

  • You need to securely store/check a ~32 digit code instead of an entire file

Cons

  • You still need to access an external resource to check the file
  • If the forger breaks into the secure location, he can cover his tracks

 

Cryptographic signatures

In this case, one or more people can "sign" the file and if any changes are made these signatures will be invalidated. Of course, if everyone who needs to sign the file is willing to (or tricked into) sign a tampered file then you can get away with the tampered file.

Pros

  • The security information can be kept within the file itself, or otherwise on the same drive, meaning easier verification.

Cons

  • Everyone who signs files needs to be very careful to prevent someone stealing their private key.
  • Everyone who signs files needs to be very careful they know what they are signing.