Is there a parallel file archiver (like tar)?

I think you are looking for pbzip2:

PBZIP2 is a parallel implementation of the bzip2 block-sorting file compressor that uses pthreads and achieves near-linear speedup on SMP machines.

Have a look at the project homepage or check your favorite package repository.

7zip can run on multiple threads when given the -mmt flag, but only when compressing into 7z-archives, which offer great compression but are generally slower than zip to create archives. Do something like this:

7z a -mmt foo.7z /opt/myhugefile.dat

The OP asked about parallel archiving, not parallel compression.

If the source material is coming from a filesystem where different directories/files might be on different disks, or even a single fast disk that exceeds the input speed of the compression tool(s), then could indeed be beneficial to have multiple streams of input going into the compression layers.

The meaningful question becomes, what does the output from a parallel archive look like? It's no longer just a single file descriptor / stdout, but a file descriptor per thread.

An example of this so far is the parallel dump mode of Postgresql pg_dump, wherein it dumps to a directory, with threads working over the set of tables to back up (work queue w/ multiple threads consuming the queue).

I'm not sure of any actual parallel archivers that are mainstream. There was a hack for Solaris Tar for use on ZFS: http://www.maier-komor.de/mtwrite.html

There are some dedicated backup tools that successfully run multiple threads, but lots more that just split the workload by directory at a high level.

Is there a parallel file archiver (like tar)?

Tags:

Performance

Archive

Tar

Related

Recent Posts