Incident response and recovery from a security breach with unknown attack vector

You are viewing this from a security perspective and (I assume) from the perspective of a security practitioner. Assuming the company is in the business of making money, they will shut down the bare minimum required time because it will affect profits. IT will seldom have much input into whether or not they can remain offline while doing root cause analysis. Hell, most of the breach investigations I've been involved in, the systems are simply restored from backups, no investigation is done into what happened until they've been hit for the umpteenth time in a few months, and we are given almost nothing to work with in finding the root cause. Management cares about dollars, and though processes may change with things like GDPR I doubt this will change too much, except more IT staff will be fired after a breach. As for your question about 'excellent intrusion detection' capabilities, these are generally lacking too. If you read through the Verizon Threat Report: http://www.verizonenterprise.com/industry/public_sector/docs/2018_dbir_public_sector.pdf on page 10 it gives a break down on 'discovery', and its generally in the months to detect a compromise.


There is a misconception in your question regarding the attack techniques and exploits that are used in these spectacular and widely-known security incidents.

Exploits are usually categorized with metrics (upon others) like severity and complexity. Usually(!) the more complex an attack the harder the forensics that are involved to figure out what exactly happened. The attacks you mention are indeed hard to execute and therefore hard to investigate. But the more important point here is: these are - according to the affected companies - not the ones that were used by attackers in these popular incidents.

Take the Equifax breach for example. Quote from Wikipedia:

Equifax said the breach was facilitated using a flaw in Apache Struts (CVE-2017-5638). A patch for the vulnerability was released March 7, yet the company failed to apply the security updates before the attack occurred 2 months later. However, this was not the only point of failure: contributing factors included the insecure network design which lacked sufficient segmentation, potentially inadequate encryption of personally identifiable information (PII), and ineffective breach detection mechanisms.

These describe no fancy attack vectors at all. The Apache Struts flaw was pretty well known by then and Struts itself was (to my knowledge) a widely-used framework. Whenever a CVE like this gets published, it gets tested by attackers around the globe immediately. Equifax IT should've patched their systems as soon as possible, but they did not. On the other hand, patching your servers takes some time, but it doesn't take forever. So you can limit the availability of your service for a while and then slowly ramp it up again, once the servers are updated.

In addition: If Equifax would have had proper segmentation, encryption or detection - all three extremely well known and must-have security-enhancing techniques - the breach would've been half bad. But they did not.

My point is: attackers don't need complex exploits chained together to create a new Stuxnet to hack large corporations like Equifax or Quora. A two month old RCE exploit is good enough.

To add a bit of speculation on my part, to answer your question "Why can they react so fast" from a different angle. I guess most of these companies knew about these security holes. And knowing about the vulnerabilities in your network makes forensics a lot easier.


Yes, Intrusion Detection and Digital Forensics have components that can be automated for quick triage on large-scale installations in very diverse technology infrastructures and complex global organizations.

Incident Response and Crisis Management are more-difficult, which often include the onus to Pull The Plug, or go offline -- especially during a root-level intrusion (aka Administrator-level, often described as Domain takeover or Domain Administrator compromise). It used to be that going offline was more common, but the Digital Forensics and Incident Response (DFIR) platforms began changing around 2012.

What these new DFIR platforms added, in terms of disruptive technology, was the ability to do Live Response. Before Commercial platform support for Live Response (e.g., FireEye HX -- originally named MIR or Mandiant Incident Response, CbER, Crowdstrike Falcon, etc), most DFIR platforms were focused either on Digital Forensics (e.g., EnCase used by corporations, or FTK used by governments especially the FBI) or Incident Response (e.g., Belkasoft RAM Capturer or Sysinternals Autoruns) -- not both. One exception was the F-Response platform, which began shipping circa 2009 (an early adopter of these techniques). The term, DFIR, wasn't used or popularized until at least 2013 -- so this is all still a very new concept for most cybersecurtiy / Infosec / IT shops.

More recently, there are new commercial solutions (e.g., Velocidex) popping up around DFIR Live Response platforms (often based on free, open-source solutions such as what was formerly-known as Google Rekall, a fork of the also-free/open-source Volatility Framework). However, there are also many solutions that are trying to indicate that they share similarities with these platforms, even though they are closer to classic Anti-Virus (AV) platforms. The official terminology is Endpoint Protection Platform (EPP), with solutions from SOPHOS, Symantec, and Mcafee. However, some platforms that are clearly EPP (e.g., Cylance, SentinelOne) try to use the terminology NG-AV (for next-generation anti-virus) or, worse, Endpoint Detection and Response (EDR), which spoils what the DFIR Live Response platforms originally attempted to disrupt.

Commercial EDR platforms are often focused more on Detection than Response, meaning that they are closer to classic AV. A true Live Response platform will enable at least 2 primary capabilities:

  1. Perform Host Isolation, meaning the ability for a system to go offline while allowing the responders the ability to access the host remotely.
  2. Provide a full-system Memory Dump from an isolated host across a variety of Operating Systems without degradation of performance and while retaining superior stability. If a kernel panics (i.e., the whole operating system crashes), then it often completely ruins the ability to retain a memory capture. The memory dump must include higher-order bits that contain the system's MBR or GPT in order to detect and respond to potential rootkits in firmware such as BIOS or UEFI matter. Often, this means that the platform installs a driver, and drivers must be carefully coded in order to prevent system crashes.

Very few DFIR service providers retain the talent and automation pipeline necessary to perform quick triage in large-scale installations even when they have successfully rolled out an EDR or DFIR Live Response platform for their clients, enhanced them, integrated them -- during the incident or crisis (post-breach), or before (pre-breach). Some of them include The Cowen Group, FireEye's Mandiant, Crowdstrike, Verizon Business, and Stroz Friedberg. There are some new players, such as Endgame, and some tied to specific industries such as Trustwave in the payment card industry. You'll see their names come up in breaking news stories around major data breaches.

In many cases, the org that suffered the news breaking major data breach already had one of these DFIR service providers (or a competitor) on retainer -- meaning that they've been paying them monthly or yearly just to keep the door open in case a crisis occurs. Sometimes you'll hear this specific offering referred to as a Compromise Assessment. These are definitely the cream-of the crop in terms of speedy and high-quality intrusion detection and digital forensics analysis!

You'll see these DFIR service providers tools (or portions thereof) and techniques in books, resources, and even open-source solutions. For example, The Cowen Group is related to TriForce (a patented digital forensics technique), FireEye has the FLARE VM, Crowdstrike has Falcon Orchestrator and VxStream Sandbox, Verizon Business released VERIS, and Stroz Friedberg has their own Github with lightgrep and acquired fsrip. Some of these were through acquisitions and others through spinoffs.

It's not just the private industry that has innovated -- clearly much of the work of MITRE, CIRCL, CERT-BDF, CERT-Tools, ANSSI-FR, and CSE-CST has been prescient to all of the above.

Others are just cool in their own right, such as Gransk, PUNCH-Cyber, and SkadiVM.