Stack Overflows - Defeating Canaries, ASLR, DEP, NX

Canary
Stack canaries work by modifying every function's prologue and epilogue regions to place and check a value on the stack respectively. As such, if a stack buffer is overwritten during a memory copy operation, the error is noticed before execution returns from the copy function. When this happens, an exception is raised, which is passed back up the exception handler hierarchy until it finally hits the OS's default exception handler. If you can overwrite an existing exception handler structure in the stack, you can make it point to your own code. This is a Structured Exception Handling (SEH) exploit, and it allows you to completely skip the canary check.

DEP / NX
DEP and NX essentially mark important structures in memory as non-executable, and force hardware-level exceptions if you try to execute those memory regions. This makes normal stack buffer overflows where you set eip to esp+offset and immediately run your shellcode impossible, because the stack is non-executable. Bypassing DEP and NX requires a cool trick called Return-Oriented Programming.

ROP essentially involves finding existing snippets of code from the program (called gadgets) and jumping to them, such that you produce a desired outcome. Since the code is part of legitimate executable memory, DEP and NX don't matter. These gadgets are chained together via the stack, which contains your exploit payload. Each entry in the stack corresponds to the address of the next ROP gadget. Each gadget is in the form of instr1; instr2; instr3; ... instrN; ret, so that the ret will jump to the next address on the stack after executing the instructions, thus chaining the gadgets together. Often additional values have to be placed on the stack in order to successfully complete a chain, due to instructions that would otherwise get in the way.

The trick is to chain these ROPs together in order to call a memory protection function such as VirtualProtect, which is then used to make the stack executable, so your shellcode can run, via an jmp esp or equivalent gadget. Tools like mona.py can be used to generate these ROP gadget chains, or find ROP gadgets in general.

ASLR
There are a few ways to bypass ASLR:

  • Direct RET overwrite - Often processes with ASLR will still load non-ASLR modules, allowing you to just run your shellcode via a jmp esp.
  • Partial EIP overwrite - Only overwrite part of EIP, or use a reliable information disclosure in the stack to find what the real EIP should be, then use it to calculate your target. We still need a non-ASLR module for this though.
  • NOP spray - Create a big block of NOPs to increase chance of jump landing on legit memory. Difficult, but possible even when all modules are ASLR-enabled. Won't work if DEP is switched on though.
  • Bruteforce - If you can try an exploit with a vulnerability that doesn't make the program crash, you can bruteforce 256 different target addresses until it works.

Recommended reading:

  • Corelan - Chaining DEP with ROP
  • Corelan - Bypassing Stack Cookies, SafeSeh, SEHOP, HW DEP and ASLR
  • ASLR/DEP bypass whitepaper (PDF)

Canaries and other volatiles do not prevent the overflow; they just try to cope with the consequences of an overflow which has happened. The canary tries to detect the case of an overflow which overwrote the return address in a stack frame. DEP is one step further, it assumes that the return address has been overwritten and followed, and it restricts the areas where execution could jump. ASLR is yet one step further: it "shuffles around" the areas where execution is allowed.

Historically, buffer overflows where exploited to overwrite the return address in the stack, so as to make execution jump into the very data which has been used to overflow the buffer. The canary tries to detect that before jumping, and DEP is used to make the stack space non-executable. DEP also works when overflowing buffers in the heap (the canary is of any use only for stack buffer overflows, but heap can contain buffers as well, and also sensitive data to overwrite, such as pointers to functions -- especially in the context of OOP languages such as C++). To work around DEP and the canary, attackers have begun to look for overflows which allow to overwrite pointers to function, so as to make execution jump into standard library code which is necessarily "there" and also necessarily executable. That's why ASLR was invented: to make such games harder. ASLR can still be defeated by being lucky: since ASLR must maintain page alignment (4 kB on x86), within a not-too-large address space (typically less than 2 GB on 32-bit x86), there are not so many places where the target code may be (at most half a million). Depending on the attack context and how often the attacker's script can try, this can be too low for comfort.

The important theme here is that canaries, DEP and ASLR do not defeat overflows themselves, but target the generic overflow exploit methods which have traditionally been employed. In any application, an overflow which overwrites non-pointer data can be as deadly as a remote shell exploit (e.g., imagine an overflow which modifies a string field called "authenticated_user_name"). The weapon race between attackers and defenders is becoming too specialized and, in my opinion, increasingly misses the point. On a general basis, it is much better to never allow the overflow to take place, i.e. block/kill the offending process/thread before writing bytes outside of the target buffer. That's what happens with almost any decent programming language (Java, C#, VB.NET, Python, Ruby, Node.js, OCaml, PHP... the choice is large).


The basic level of protection is ASLR + DEP.

If you don't use both of those, then there are many powerful techniques for exploiting a buffer overrun (e.g., return-oriented computing, heap spraying, repeated guessing). For instance, DEP alone can be defeated using return-oriented computing; and ASLR alone can be defeated using heap spraying and repeated attempts.

However, if the target uses both ASLR + DEP, exploitation becomes significantly harder. The techniques mentioned above are not sufficient to defeat ASLR + DEP. ASLR + DEP are like a one-two punch that make the attacker's life much harder. Defeating the combination of ASLR + DEP is not impossible, but it takes much more cleverness.

My favorite example of methods for defeating ASLR + DEP is explained in the slide deck, Interpreter Exploitation: Pointer Inference and JIT Spraying. There, the author describes how he exploited a memory safety error in Flash. He exploited properties of the Flash JIT to arrange memory in a way that lets him mount a code-injection attack, despite the presence of ASLR + DEP. Recall that a JIT is a just-in-time compiler; it compiles Flash bytecodes to native code. The native code will be stored somewhere in memory, and the Flash JIT marks it executable (despite DEP). The author found a way to generate Flash bytecodes that, when compiled, would generate a sequence of bytes that embedded his malicious shellcode (offset by a byte). He then used heap spraying techniques to ensure that there were many copies of this in memory. Finally, he exploited the memory-safety bug to cause the program to jump to another address; due to ASLR, this was like jumping to a random address, but the many copies ensured that with high probability this would jump into his shellcode. In this way, he bypassed both ASLR and DEP -- a nifty feat.

One last note: it is worth mentioning that ASLR is much more effective on 64-bit architectures. On 32-bit architectures, ASLR can often be defeated by simply making multiple attempts. There just aren't enough degrees of freedom on 32-bit platforms to introduce enough randomness, so the attacker's chances of succeeding by dumb luck remain too high, on 32-bit platforms. For the strongest defense, use a 64-bit platform.