At which point can a system be compromised when downloading archived data from an untrusted source?

1 should not present any danger as long as the file is just saved somewhere and no attempts to open it with anything are made. If you view it even with a text editor, there's already a small danger of exploits.

In the case of 2 there are vulnerabilities and exploits, so there are dangers. Some examples of such possible scenarios:

  • Arbitrary file writes caused by .tar.gz archive symbolic link (symlink) vulnerabilities that are exploited because of how Bower (a popular web package manager) extracts such archives

  • CVE-2018-20250 is an absolute path traversal vulnerability in unacev2.dll, the DLL file used by WinRAR to parse ACE archives that has not been updated since 2005. A specially crafted ACE archive can exploit this vulnerability to extract a file to an arbitrary path and bypass the actual destination folder. In its example, CPR is able to extract a malicious file into the Windows Startup folder.

  • CVE-2018-20252 and CVE-2018-20253 are out-of-bounds write vulnerabilities during the parsing of crafted archive formats. Successful exploitation of these CVEs could lead to arbitrary code execution.

  • Zip Slip which attackers might use to target files they can execute remotely, such as parts of a website, or files that a computer or user are likely to run anyway, like popular applications or system files.

  • Helm Chart Archive File Unpacking Path Traversal Vulnerability.

  • CVE-2015-5663 - the file-execution functionality in WinRAR before 5.30 beta 5 allows local users to gain privileges via a Trojan horse file with a name similar to an extension-less filename that was selected by the user.

  • CVE-2005-3262 allows remote attackers to execute arbitrary code via format string specifiers in a UUE/XXE file, which are not properly handled when WinRAR displays diagnostic errors related to an invalid filename

There are plenty more examples and databases with such vulnerabilities and even most of them got fixed in later versions of the software, a risk still exists.

So therefore, [2] is risky and should be handled with care.


In theory all of these places could be exploited. I am not going to go into specific exploits available as these change constantly with archive format and moving tech:

Initially downloading and saving the archived data (still packed)

It is unlikely but it is possible that your download manager / web browser does have some kind of exploit. You say the source is untrusted therefore the server could try and attack your download program using exploits in its implementation or weaknesses in the file transfer protocol you are using. These exploits are rare but not unheard of. But fundamentally unless you are certain your software is entirely unexploitable any network connection with a malicious server could result in an attack.

You can somewhat mitigate this by sandboxing the download software with only minimal permissions and access needed to the location you wish to download to and the network stack. This mostly mitigates this weakness assuming your OS permission model or sandboxing software do not also have exploits.

Unpacking the archived data

There are numerous attacks over the years involving using poisoned archive files to run arbitrary code on a system by exploiting weaknesses in the archive format or decompression software. These are probably more common than the above weakness.

The main protections are again making sure to give the extraction program minimal permissions and potentially sand-boxing it to ensure it can do minimal damage if it is attacked successfully. Caveats above apply.

Executing any file from the unpacked archive

This is obviously enormously risky, and the same issues as running and malicious software applies. It is relatively easy for software when run explicitly to break many sandboxes and permission system protections so all bets are off. You can have some safety running the software in a hardened VM but this still doesnt fully protect you short of using an airgapped machine to run the programs which is then destroyed.

TLDR

All of these steps are fairly risky, but each successive step is probably more dangerous than the last.