More efficient file compression program for many identical files?

7-zip supports solid compression if I remember correctly, so it should compress a lot of nearly identical files very well.


I did some testing on the aspect of "identical files", as mentioned in the question, using 7-zip (version 9.20), as no one gave an elaborate answer on that, yet. This gave some interesting results. I tested with 10 copies of the file that this sites uses for its page-not-found message. This file won't compress very well as an individual file, being a jpg-file. So, it will demonstrate the efficiency of compressing multiple identical files. Its file size is 37 KB.

  1. When I compress all ten copies, using to zip-format, the file size is 367 KB, with a compressed size of about 99% of the original total size of all 10 files.
  2. When I compress all ten copies, using to 7z-format, the file size is 37 KB, with a compressed size of about 101% of just one of the original files.
  3. If I first put 5 copies in a 7-z archive, then add 3 and finally 2 copies in separate steps, the file size becomes 111 KB, about three times the size of a single original file.

If I open the 3rd archive, one of the properties is Block. This lists 0, 1 and 2 for 3, 5 and 2 of the files, respectively.

Observations:

  1. The zip-format will compress each file individually, not benefiting from the possible to efficiently compress identical files.
  2. The 7z-format will efficiently compress multiple identical files, as long as they are added to the archive in one step.

Conclusions:

  1. For optimal compressions of files, use 7z rather than zip.
  2. Compression may improve dramatically, if you do not add files to an existing 7z-archive, but first decompress it and that compress it again, including the new files, in one step.