Indexed archive format?

The Zip format compresses each file separately, and then combines them (with a directory of archive contents) into a single archive file.


In addition to the already mentioned zip format, the dar and dump utilities also are good at handling this, and unlike zip, retain the unix permissions. For dar you want to avoid using the solid archive option, as that goes back to the tar/gzip method of compressing the whole thing at once, which gives better compression, but makes extracting individual files take longer as the whole file must be decompressed until the desired file is found. dump handles large sets of smallish files ( tens of thousands ) rather well, and can do multithreaded compression, but it only reads ext[234] filesystems.


pixz is a parallel, indexing version of xz.

# Compress:
tar -I pixz -cf foo.tar.xz ./foo

# Decompress:
tar -I pixz -xf foo.tar.xz

# Very quickly list the contents of the compressed tarball:
pixz -l foo.tar.xz

# Very quickly extract a single file:
pixz -x dir/file < foo.tar.xz | tar x

Tags:

Linux

Archive

Tar