Why doesn't changing a file's name change its checksum?

The name of a file is a string in a directory entry, and a number of other meta data (file type, permissions, ownership, timestamps etc.) is stored in the inode. The filename is therefore not part of what constitutes the actual data of the file. In fact, a single file may have any number of names (hard links) in the filesystem, and may additionally be accessible through any number of arbitrarily named symbolic links.

Since the filename is not part of the file's data, it will not be included automatically when you calculate e.g. the MD5 checksum with md5 or md5sum or some similar utility.

Changing the file's name (or ownership or timestamps or permission etc.) or accessing it via one of its other names or symbolic links, if it has any, will therefore not have any effect on the file's MD5 checksum.

Yes as you said "file name is not a part of file data"

The file name can not be stored in the file. If it were then it would change the file. However it could be valid to check-sum the filename, other meta data, and file data, but this is usually a bad idea.

The file-name is part of its containing directory. Not part of the file.

If you want to checksum/hash both then some thing like this will work

(Probably not a good idea)

echo "$filename" | xargs -n1 -I{} bash -c 'echo "$1"; cat "$1"' x {} | shasum

when I change a file's name this does not affect its checksum (I've tried SHA-1, SHA-256 and MD5).

Well, this is somewhat a false connection. SHA-1, SHA-256 and MD5 don't calculate hashes of files or file names, they calculate hashes of bit streams. So, the result you get depends entirely on what you choose to give as the input, and you didn't show that.

Now, you perhaps used the sha1sum, sha256sum and md5sum utilities, and indeed they only include the contents of the given file in the data to be hashed. Not the file name, not the permission bits, owner information, timestamps or other metadata.

But it doesn't have to be done like that. Here's the SHA-256 hashes of two files and their names:

$ echo hello > a.txt; cp a.txt b.txt
$ ./checksum.sh a.txt  b.txt 
aed49f7730ca0736fe1a021375d1ca9b509a4e72910b422578df8b4b1930aeca  -
bad46702033923726add35ef8d97570f1aa40d93dad1d6ba63e7b051a34b9efc  -

The script simply prepends the file names to the hashed data. Another application could include metadata in the hash input along with the file contents, or include hashes that only cover part of the data.

Obviously, including the file name has the disadvantage that even the very same file can be referenced to by different names and can hence have numerous distinct hashes:

/tmp/test$ ./checksum.sh a.txt ./a.txt /tmp/test/a.txt 
aed49f7730ca0736fe1a021375d1ca9b509a4e72910b422578df8b4b1930aeca  -
85ec58226886f4f853212b2d21bb2fb72447813ac13a59e9376b2e0c02074839  -
25c1c072481131e07c3fc20d16109472872233f658f4df3c4982fb195a048b96  -

Adding timestamps, owners and such to the equation would almost guarantee the hashes being different after the file was copied to another system, making the usefulness of the hash rather questionable. Even the file name might get lost or changed.

If you want to include the metadata in the hash, it's probably easiest to put the file(s) in a tar archive, or some other container that stores the metadata you find useful, and hash and copy that. After extracting the file (contents) from the archive, the metadata on the file system might be different, but you could still verify the archive the file came from.

The script above is:

$ cat checksum.sh
#!/bin/bash
for f in "$@"; do
        (printf "%s\0" "$f" ; cat "$f") | sha256sum - 
done

Why doesn't changing a file's name change its checksum?

If you want to checksum/hash both then some thing like this will work

Tags:

Checksum

Filesystems

Related

Recent Posts