What leaked mass surveillance capabilities have not been explained by vulnerabilities we have learned about since 2014?

I did some more research to come up with an own (quite possibly incomplete and/or wrong) answer. This is a community wiki answer. Edits or separate answers are more than welcome.

TL;DR: Almost all mass-decryption capabilities can easily be explained using publicly known vulnerabilities, attacks against SSH being a possible exception.

The NSA BULLRUN program is the central part behind the Five-Eyes decryption capabilities. According to this presentation by the OTP VPN Exploitation Team, cryptoanalytic attacks where available against at least IPSec, SSL, PPTP, and SSH. The leaked documents don't always contain a date, but it seems safe to assume that none are newer than 2012. Newly introduced vulnerabilities like Heartbeat, which was widely deployed starting autumn 2012, are therefor not a sufficient explanation.

  • IPSec: The Logjam authors speculate that the Diffie-Hellman key exchange may have been broken by nation-state attackers using a precomputation attack. This would have enabled agencies to attack the majority of potential targets, as many sites used the same Oakley group 1 parameters. The SLOTH transcript collision attack is another likely explanation, as MD5 was still widely used back then.

  • SSL: The mentioned transcript collision and DH-Kx attacks are also possible (and possibly even simpler) on SSL connections. Furthermore, numerous SSL/TLS vulnerabilities have become public knowledge in the early 2010s. This includes FREAK (resembles Logjam), BEAST, BREACH, and POODLE as well as attacks against the RC4 cipher. According to SSL Pulse by the Trustworthy Internet Movement, close to 100% of all sites supported SSLv3 in mid 2012. Routine decryption of SSL traffic is probably the least surprising in hindsight.

  • PPTP: Quoting Wikipedia:

    Serious security vulnerabilities have been found in the protocol. The known vulnerabilities relate to the underlying PPP authentication protocols used, the design of the MPPE protocol as well as the integration between MPPE and PPP authentication for session key establishment.

  • SSH: The SLOTH attack also works against SSH, though the authors consider the attack impractical due to it's still very high computational cost. DH-Kx attacks might be possible as well or could be combined with transcript collision attacks, though I'm not aware of any work that has demonstrated the feasibility of any of this.

The cryptoanalytic capabilities may have been known to the mentioned agencies well before being published in the unclassified literature.

Non-cryptoanalytic attacks may have contributed as well, e.g. backdoors or remote code execution on networking gear serve as a perfectly fine explanation. This would have made it simple to extract encryption keys for any protocol from the affected devices, rendering attacks on those protocols trivial.


I'm not sure this is a question which can really be answered. Once data is leaked, it is very difficult to determine where it originated. If it has something unique, we may be able to identify the source of the data and from there look for signs of how it was exfiltrated, but more often than not, it works in reverse. A company finds signs they have been compromised and then goes about trying to work out what data has been taken. There are frequently large dumps of data where it is not known where the data originated. However, this does not mean the data was extracted using some new compromise or vulnerability - it could just mean that the data was extracted from a company who is unaware they have been compromised.

The other problem with this question is that even if we can explain how data might have been obtained, that doesn't mean the explanation is correct. Once a technique sufficiently matches the facts to work as an explanation, we stop looking.

The other problem is that if someone has an unknown way of stealing/interepting data, they will keep it very secret and will be careful not to release data which would make it obvious either how they are getting the data or that they have access which was not previously possible. The biggest advantage they have is in keeping this knowledge locked down. Essentially, you would need another Snowden to find out about this.

Finally, there is a big difference between not know where data was extracted from and being able to explain how it was extracted. There are so many data leaks and so many possible vulnerabilities which can be extracted, there is little known released data which cannot be explained - that explained. That explanation may not be correct, but there is little which would be considered unexplainable, but for all we know, someone has cracked quantum computing, can break all common cyphers and has managed to embed back doors in the majority of harware supply chaines and we are just unaware of it. The unknown by definition is unknown.

Tags:

Surveillance