How do I securely extract an untrusted tar file?

You don't need the paranoia at all. GNU tar — and in fact any well-written tar program produced in the past 30 years or so — will refuse to extract files in the tarball that begin with a slash or that contain .. elements, by default.

You have to go out of your way to force modern tar programs to extract such potentially-malicious tarballs: both GNU and BSD tar need the -P option to make them disable this protection. See the section Absolute File Names in the GNU tar manual.

The -P flag isn't specified by POSIX,¹ though, so other tar programs may have different ways of coping with this. For example, the Schily Tools' star program uses -/ and -.. to disable these protections.

The only thing you might consider adding to a naïve tar command is a -C flag to force it to extract things in a safe temporary directory, so you don't have to cd there first.


Asides:

  1. Technically, tar isn't specified by POSIX any more at all. They tried to tell the Unix computing world that we should be using pax now instead of tar and cpio, but the computing world largely ignored them.

    It's relevant here to note that the POSIX specification for pax doesn't say how it should handle leading slashes or embedded .. elements. There's a nonstandard --insecure flag for BSD pax to suppress protections against embedded .. path elements, but there is apparently no default protection against leading slashes; the BSD pax man page indirectly recommends writing -s substitution rules to deal with the absolute path risk.

    That's the sort of thing that happens when a de facto standard remains in active use while the de jure standard is largely ignored.


With GNU tar, it's simply

tar -xvf untrusted_file.tar

in an empty directory. GNU tar automatically strips a leading / member names when extracting, unless explicitly not told otherwise with the --absolute-names option. GNU tar also detects when the use of ../ would cause a file to be extracted outside of the toplevel directory and puts those files in the toplevel directory instead, e.g. a component foo/../../bar/qux will be extracted as bar/qux in the toplevel directory rather than bar/qux in the parent of the toplevel directory. GNU tar also takes care of symbolic links pointing outside the toplevel directory, e.g. foo -> ../.. and foo/bar will not cause bar to be extracted outside the toplevel directory.

Note that this only applies to (sufficiently recent versions of) GNU tar (as well as some other implementations, e.g. *BSD tar and BusyBox tar). Some other implementations have no such protection.

Because of symbolic links, the protections you use wouldn't be enough: the archive could contain a symbolic link pointing to a directory outside the tree and extract files in that directory. There's no way to solve that problem based purely on the member names, you need to examine the target of symbolic links.

Note that if you're extracting into a directory that already contains symbolic links, the guarantee may no longer hold.


To cover a few points the other answers haven't:

  1. First, look what's in the file before you extract it:

    tar -tvf untrusted_tar_file.tar
    

    If there's anything in there you don't trust or want to extract, don't extract the tarball.

  2. Second, extract the tarball as a non-root user that only has write access to the one directory you're extracting the tarball into. For example, extract the tarball from within the non-root user's home directory.

Tags:

Security

Tar