What is the safest way to deal with loads of incoming PDF files, some of which could potentially be malicious?

I think the safest option for you would be to use Qubes OS with its built in DisposableVMs functionality, and its “Convert to Trusted PDF” tool.

What is Qubes OS?

Qubes is an operating system where it's all based on virtual machines. You can think of it as if you had different isolated ‘computers’ inside yours. So that way you can compartmentalize your digital life into different domains, so that you can have a ‘computer’ where you only do work related stuff, another ‘computer’ that is offline and where you store your password database and your PGP keys, and another ‘computer’ that is specifically dedicated for untrusted browsing... The possibilities are countless, and the only limit is your RAM and basically how much different ‘computers’ can be loaded at once. To insure that all these ‘computers’ are properly isolated from each other, and that they can't break to your host (called ‘dom0’ for domain 0) and thereby control all of your machine, Qubes uses the Xen hypervisor,[1] which is the same piece of software that is relied upon by many major hosting providers to isolate websites and services from each other such as Amazon EC2, IBM, Linode... Another cool thing is that each one of your ‘computers’ has a special color that is reflected in the windows' borders. So you can choose red for the untrusted ‘computer’, and blue for your work ‘computer’ (see for example picture below). Thus in practice it becomes really easy to see which domain you're working at. So let's say now that some nasty malware gets into your untrusted virtual machine, then it can't break and infect other virtual machines that may contain sensitive information unless it has an exploit that can use a vulnerability in Xen to break into dom0 (which is very rare), something that significantly raises the bar of security (before one would only need to deploy malware to your machine before controlling everything), and it will protect you from most attackers except the most resourced and sophisticated ones.

What are DisposableVMs?

The other answer mentioned that you can use a burner laptop. A Disposable Virtual Machine is kind of the same except that you're not bound by physical constraints: you have infinitely many disposable VMs at your wish. All it takes to create one is a click, and after you're done the virtual machine is destroyed. Pretty cool, huh? Qubes comes with a Thunderbird extension that lets you open file attachments in DisposableVMs, so that can be pretty useful for your needs.[2]

enter image description here

(Credits: Micah Lee)

What's that “Convert to Trusted PDF” you were talking about?

Let's say you found an interesting document, and let's say that you had an offline virtual machine specifically dedicated for storing and opening documents. Of course, you can directly send that document to that VM, but there could still be a chance that this document is malicious and may try for instance to delete all of your files (a behavior that you wouldn't notice in the short-lived DisposableVM). But you can also convert it into what's called a ‘Trusted PDF’. You send the file to a different VM, then you open the file manager, navigate to the directory of the file, right-click and choose “Convert to Trusted PDF”, and then send the file back to the VM where you collect your documents. But what does it exactly do? The “Convert to Trusted PDF” tool creates a new DisposableVM, puts the file there, and then transform it via a parser (that runs in the DisposableVM) that basically takes the RGB value of each pixel and leaves anything else. It's a bit like opening the PDF in an isolated environment and then ‘screenshoting it’ if you will. The file obviously gets much bigger, if I recall it transformed when I tested a 10Mb PDF into a 400Mb one. You can get much more details on that in this blogpost by security researcher and Qubes OS creator Joanna Rutkowska.


[1] : The Qubes OS team are working on making it possible to support other hypervisors (such as KVM) so that you can not only choose different systems to run on your VMs, but also the very hypervisor that runs these virtual machines.
[2] : You also additionaly need to configure an option so that the DisposableVM-that is generated once you click on “Open in DispVM”-will be offline, so that they can't get your IP address. To do that: "By default, if a DisposableVM is created (by Open in DispVM or Run in DispVM) from within a VM that is not connected to the Tor gateway, the new DisposableVM may route its traffic over clearnet. This is because DisposableVMs inherit their NetVMs from the calling VM (or the calling VM's dispvm_netvm setting if different). The dispvm_netvm setting can be configured per VM by: dom0 → Qubes VM Manager → VM Settings → Advanced → NetVM for DispVM." You'll need to set it to none so that it isn't connected to any network VM and wont have any Internet access.
[3] : Edit: This answer mentions Subgraph OS, hopefully when a Subgraph template VM is created for Qubes you could use it with Qubes, making thus exploits much harder, and thanks to the integrated sandbox it would require another sandbox escape exploit as well as a Xen exploit to compromise your entire machine.


Safest would probably be a burner device. Grab a cheap laptop, and a mobile internet dongle, use it to download the documents, and manually copy across any contents to your main computer (literally retyping would be safest, if you're particularly worried). Since it's not on your network, it shouldn't be able to cause problems even if it got infected, and you'd be able to wipe it or just bin it if you have any particularly evil malware sent to you.

If you need actual contents from the files (e.g. embedded images), one option would be to install a PDF print driver on your burner device, and to print the incoming PDF files using it - this will generate PDF output, but, in theory, just the visual components. Printers don't tend to need script elements, hence they can be safely dropped. Bear in mind that some PDF printer drivers spot when you provide a PDF, and just pass it through unmodified - test before relying on it! Once you've got a clean PDF, email it back to yourself, and check with a virus scanner on your main machine before opening. Note that this doesn't completely eliminate the possibility of malware getting through, but should minimise the chances.


So, I try to stick with these concerns in the "land of reasonable". With every security issue there is a balance of secure v.s. safe. For example, you could buy a laptop, read one PDF loaded from the web mail side of your email provider, re type any content you need on a "main computer" then destroy the laptop starting all over again with a new laptop. That would be pretty secure. Also costly, and a giant pain.

So back to a "reasonable" approach.

First, use Linux and a up to date PDF reader. By doing so you have really reduced your exposure. There are not as many viruses written for Linux as there are for windows. That alone will protect you quite a bit. The viruses that do work on Linux are more complicated to implement. Again reducing your exposure.

Next use a Virtual machine that supports snap-shotting. The idea is that you setup your Linux OS inside a virtual machine host (like VirtualBox) get it all setup then, "Snapshot" the state.

You can then do all your "risky" work inside the virtual machine. Using isolation options, I don't know of any virus that can "escape" the virtual machine and get to the host machine (doesn't mean they're not out there, just means it's more rare, and more complicated for the attacker).

At the end of the day, or any time during the day when you think you have gotten a virus, then you "revert" the machine to the previous snapshot. All the changes and data that "happened" after your snapshot are undone, including any work, viruses, etc.

During the day, you can open a PDF, scan it with ClamAV (or the like), copy and paste what you need, or what ever you need to do with the PDF files, so long as your Virtual Machine exists in isolation. That means that you don't give the virtual machine access to the host machine. You use something like email to transfer the files. Maybe FTP between the host and the virtual machine. Something, but not direct integration. Not dropbox either. Something where if you're going to transfer the file, then you're only going to transfer that one file after you're pretty sure it's safe. If you're using a Linux host and a Linux guest then scp is a great choice.

This gives you a "pretty secure", disposable environment, to check your questionable PDFs out, with the ability to "undo" damage that may happen, without having to really change much in your work flow.

Virtual machine hosts and guests can be almost any OS including Windows. Keep in mind that if you have a Linux guest and a Windows host the Linux virtual machine may not even be susceptible to a virus that is in the PDF that a Windows machine will be susceptible to. Scanning with an anti-virus scanner is important, no matter the OS combo in use.