gzip without tar? Why are they used together?

TAR creates a single archived file out of many files, but does not compress them.

Format Details

A tar file is the concatenation of one or more files. Each file is preceded by a 512-byte header record. The file data is written unaltered except that its length is rounded up to a multiple of 512 bytes and the extra space is zero filled. The end of an archive is marked by at least two consecutive zero-filled records.

GZIP compresses a single file into another single file, but does not create archives.

File Format

...Although its file format also allows for multiple such streams to be concatenated (zipped files are simply decompressed concatenated as if they were originally one file), gzip is normally used to compress just single files.[4] Compressed archives are typically created by assembling collections of files into a single tar archive, and then compressing that archive with gzip.


Gzip / Bzip2 are stream compressors. They compress a stream of data into something smaller. They could be used on individual files, but not on groups of files on their own.

Tar on the other hand has the ability to turn a list of files, with paths, permissions and ownership information, into a single continuous stream - and vice versa.

That's why, to archive files (and if one needs compression as well), one usually uses tar + some compression method.


Tar is in charge of doing one and only one thing well: (un)archiving into(out of) a single archive file. Of what? Of one and only one thing: a set of files.

Gzip is in charge of doing one and only one thing well: (un)compressing. Of what? Of one thing and one thing only: a single file of any type... and that includes a file created with tar.

It goes back to the UNIX philosophy of pipelining, the underlying "pipe and filters" architecture ; the treatment of everything as a file and the sound architectural goal of "one-thing-does-one-thing-only-and-does-it-well" (which results in a very elegant and simple plug-n-play of sorts.)

In its simplicity, it is almost algebraic in nature (a hefty goal in systems design). And that is no easy feat.

In many ways (and not without its flaws), this is almost a pinnacle in composability, modularity, loose coupling and high cohesion. If you understand these four (and I mean really understand), you understand, it will be obvious why tar and gzip work like that in pairs.

Tags:

Tar

Gzip

Gz