Compressing two identical folders give different result

Ok, the explanation given by ddnomad is correct. It's about the timestamp.

Here is the solution:

add --mtime='1970-01-01 00:00:00' to tar command:

tar --mtime='1970-01-01 00:00:00' -Jcf archive.tar.xz *

This will force contents timestamp to a fixed value thus resulting in identical archives.


There are a number of reasons why two tarballs of the same directory tree might differ. The main ones are:

  • Metadata such as ownership, timestamps, etc. may differ. To get a reproducible tar archive, you need to have the same ownership, permissions and timestamps. Make sure that you copied all the metadata (if you have identical file contents with differing metadata, cp -a --attributes-only may help). With GNU tar, there are a few options you can use to ignore certain attributes:

    • --numeric-owner only stores numerical user and group IDs, not names.
    • --owner and --group force files to be recorded under a certain user and group respectively (e.g. --owner=0 --group=0 to record all files as belonging to root).
    • --set-mtime allows you to store all files with a particular timestamp instead of the real one.
  • The order in which the files are stored may differ. Most filesystems don't give any particular guarantee as to the order in which files are listed in a directory, and tar lists them as they come. (You can see the order with ls -U.) GNU tar 1.28 has a new option --sort=name. With older versions or other implementations, you can get a reproducible file order by building a sorted list of file names and passing it to tar:

    find . -print0 | LC_ALL=C sort -z | tar --no-recursion -Jcf ../archive.tar.xz -T -
    

You may be interested in the Debian wiki page on reproducible builds.


Every file (folder is a file also) has an embedded time stamp.

I presume you can't create these to folder structures in the same time so time stamps for these files are different.

As the result, archiving or hashing would give you different outcomes as time stamp is a part of file that is used in both operations.

So that's the difference between seemingly identical file structures.

UPDATE: as of checking they have similar contents I guess you have actually to check the contents of these files and compare them.