Lecture 2 – Building a program (Digital Systems)

Copyright © 2024 J. M. Spivey
Jump to navigation Jump to search

Memory map

Memory map

[2.1] The other piece of information needed to design programs for the micro:bit is the layout of the memory. As the diagram shows, there are three elements, all within the same address space:

  • 256kB of Flash ROM, with addresses from 0x0000 0000 to 0x0003 ffff. This is where programs downloaded to the board are stored. Significantly, address zero is part of the Flash, because it is there that the interrupt vectors are stored that, among other things, determine where the program starts.
  • 16kB of RAM, with addresses from 0x2000 0000 to 0x2000 3fff. This is where all the variables for programs are stored: both the global variables that are not part of any subroutine, and the stack frames that contain local variables for each subroutine activation.
  • I/O registers in the range from 0x4000 0000 upward. Specific locations in this range correspond to particular I/O devices. For example, storing a character at address UART_TXD = 0x4000 251c will have the effect of transmitting it over the serial port, and storing a 32-bit value at GPIO_OUT = 0x5000 0504 has the effect of illuminating some or all of the LEDs on the board.

It's possible for the contents of Flash to be modified under program control, but we shall not use that feature.

Naturally enough, information about the layout of memory must form part of any programs that run on the machine. In our code, it is reflected in two places: the linker script NRF51822.ld contains the addresses and sizes of the RAM and ROM, and instructions about what parts of the program go where; and the header file hardware.h contains the numeric addresses of I/O device registers like UART_TXD and GPIO_OUT. This information comes from the data sheet for the nRF51822 chip.

To think about: why was the nRF51822 designed with so little RAM?


Other microcontrollers adopt two alternatives to this single-address-space model:
  • Sometimes I/O devices are accessed with special instructions, and have their own address space of numbered 'ports'. This is not really such a significant difference, because performing I/O still amounts to selecting a port by its address and performing a read or write action.
  • Sometimes the processor has its program in a different address space from its data. This is an attractive option for microcontrollers, where the program is usually fixed and can be stored in a ROM separate from the RAM that is used for data. It's called a 'Harvard archtecture', in contrast with the 'von Neumann architecture' with a single memory. If the two address spaces are separate, then separate hardware can be used to access the ROM and the RAM, and it can access both simultaneously without fear of interference between the two. The disadvantage is that special instructions are usually needed to access, e.g., tables of constant data held in the ROM.
More complex processors with cache memory between the processor and RAM often adopt a 'modified Harvard archicture' where there are independent caches, concurrently accessible, for code and data, but with a single, uniform RAM behind them.

Building a program

Source code: lab1-asm

Lab one for this course lets you write small subroutines in Thumb assembly language and test them, using a main program that prompts for two numbers, calls your subroutine with the two numbers as arguments, then prints the arguments and result.

As you will see in the demo, the arguments and result are shown both in decimal and in hexadecimal; you can enter numbers in hexadecimal too by preceding them with the usual 0x prefix.

[2.2] This program is built from your one subroutine written in assembly language, with the rest of the program written in C for convenience. (At some point, we'll make a program written entirely written in assembly language just to demonstrate that it's possible.) Let's begin by looking at the entire contents of the source file add.s that defines the function foo. The lines starting with @ are comments.

@ This file is written in the modern 'unified' syntax for Thumb instructions:
        .syntax unified

@ This file defines a symbol foo that can be referenced in other modules        
        .global foo

@ The instructions should be assembled into the text segment, which goes
@ into the ROM of the micro:bit

@ Entry point for the function foo
@ ----------------
@ Two parameters are in registers r0 and r1

        adds r0, r0, r1          @ One crucial instruction

@ Result is now in register r0
@ ----------------
@ Return to the caller
        bx lr

[2.3] The one instruction that matters is the line reading adds r0, r0, r1. Let's assemble the program:

$ arm-none-eabi-as add.s -o add.o

This command takes the contents of file add.s, runs them through the ARM assembler, and puts the resulting binary code in the file add.o. We can see this code by dis-assembling the file add.o, using the utility objdump.

$ arm-none-eabi-objdump -d add.o
00000000 <foo>:
   0:  1840            adds    r0, r0, r1
   2:  4770            bx      lr

This reveals that the instruction adds r0, r0, r1 is represented by the 16 bit value written as hexadecimal 0x1840.

The program also contains four files of C code, and we will need to translate those also into binary form using the compiler arm-none-eabi-gcc. The easy way to do that is to invoke the command make, because the whole procedure for building the program has been described in a Makefile:

$ make
arm-none-eabi-gcc -mcpu=cortex-m0 -mthumb -g -O -c main.c -o main.o
arm-none-eabi-gcc -mcpu=cortex-m0 -mthumb -g -O -c lib.c -o lib.o
arm-none-eabi-gcc -mcpu=cortex-m0 -mthumb -g -O -c startup.c -o startup.o

Notice that the C compiler is being asked to generate Thumb code for the Cortex-M0 core that is present in the micro:bit; it is also being asked to include debugging information in its output (-g), and to optimise the object code a bit (-O). We don't use higher levels of optimisation (-O2 or -Os, because they make the object code harder to understand and interfere with running it under a debugger.)

What are these files of C code anyway? Well, main.c contains the main program, with the loop that prompts for two numbers, passes them to your foo subroutine, then prints the result. It also contains a simple driver for the micro:bit's serial port so that the program can talk to a host PC via USB and minicom. The file lib.c contains a simple implementation of the formatted output function printf that we shall use in most of our programs. Finally, the file startup.c contains the code that runs when the micro:bit starts ("comes out of reset"), setting things up and then calling the main program.

Compiling and linking

[2.4] Having done all this compiling and assembling, we have four binary files with names ending in .o. To make them into one program, we need to concatenate the code from all these files, then fix them up so that references from one file to another (and within a single file too) use the right address. All this is done by the linker ld, or more accurately arm-none-eabi-ld. Often, we invoke the linker via the C compiler gcc, because that lets gcc add its own libraries into the mix; but so that we can see all that is happening, the Makefile contains a command to invoke ld directly:

arm-none-eabi-ld add.o main.o lib.o startup.o \
    /usr/lib/gcc/arm-none-eabi/5.4.1/armv6-m/libgcc.a \
    -o add.elf -Map add.map -T nRF51822.ld

This links together the various .o files, and also a library libgcc.a that contains routines that the C compiler relies on to translate certain C constructs – in particular, the integer division that is used in converting numbers for printing. The output file add.elf is another file of object code in the same format as the .o files. Another output from the linker is a report add.map that shows the layout of memory, and that layout is determined partially by the linker script nRF51922.ld that describes the memory map of the chip and what each memory area should be used for. You'll see that the Makefile next invokes a program size to report on the size of the resulting program, so we can see how much of the memory has been used: not much in this case.

The file add.elf contains the binary code for the program, but sadly it is in the wrong format for downloading to the board. So the final step of building described in the Makefile converts the code into "Intel hex" format, which is what the board expects to receive over USB.

arm-none-eabi-objcopy -O ihex add.elf add.hex

Building is now done, and we can copy the file add.hex to the board. On my machine, I can type

$ cp add.hex /media/mike/MICROBIT

and the job is done. (Something like

$ cp add.hex /run/media/u19abc/MICROBIT

is needed on the Software Lab machines.)

Although it's not necessary for running the program, it's fun to check that the binary instructions 0x1840 and 0x4770 appear somewhere in the code. If you look in add.hex, you will find that line 13 is


and that contains the two fragments 4018 and 7047, which rearrange to give 1840 and 4770. (Why is the rearrangement neeed? You can find a specification for Intel hex format on Wikipedia.)

Alternatively, you can make a pure binary image of the program with the command

$ arm-none-eabi-objcopy -O binary add.elf add.bin

and then look at a hexadecimal dump of that file, divided into 16-bit words:

$ hexdump add.bin
00000c0 1840 4770 4b07 681b 2b00 d103 4a06 6813

(Even on a 32-bit machine, tradition dictates that hexdump splits the file into 16-bit chunks.) As you can see, the two instructions 0x1840 and 0x4770 appear at address 0xc0 in the image.

Other instructions

For the single adds instruction that appears in func.s, we can substitute others.

  • There is an instruction subs that subtracts instead of adding. We quickly find that (one of) the conventions for representing numbers as bit patterns allow negative numbers to be represented also.
subs r0, r0, r1
  • Since subtraction is not commutative, it makes a difference if we specify the inputs in the opposite order
subs r0, r1, r0
  • The Nordic chip has the optional multiplier, so it implements the muls instruction, but Thumb code imposes the restriction that the output must be the same as one of the inputs. You can see this from the Reference Manual page for muls
muls r0, r0, r1
  • We can use several instructions to compute more complex expressions. Here we compute a * (b + 3) by first putting b + 3 in register r2.
adds r2, r1, #3
muls r0, r0, r2
  • If we try a * 5 + b, then we find there is no form of the muls instruction that contains a constant. So we write instead
movs r2, #5
muls r0, r0, r2
adds r0, r0, r1

Thumb code restricts what sensible operations actually exist as instructions. At first sight, this is irksome and requires us to memorise lots of silly rules. But with a bit of experience, we find that most instructions we want to write can actually be written, and for most purposes we can give over the generation of instructions to a compiler, which is better than us at encoding and applying the rules.

  • For an expression like (a + b) * (a - b), we will need to compute at least one of the operands of the multiplication into a register other than r0 and r1. Registers r2 and r3 are freely available for this purpose, so let's choose r3.
subs r3, r0, r1
adds r0, r0, r1
muls r0, r0, r3

Experience with the debugger shows that when func is called, register r2 and r5 contain copies of b. The ABI – that is, the conventions about what registers can be used where – allows us to change the value in r2, as we did in a * (b + 3) example. What if we naughtily change the value in r5 instead?

movs r5, #5
muls r0, r0, r5
adds r0, r0, r1

Ah! It changes the value of b that is printed by the main program:

a = 10                                                                          
b = 1                                                                           
func(10, 5) = 51                                                                
func(0xa, 0x5) = 0x33                                                           
6 cycles, 0.375 microsec

We see that the main program was saving the value of b in r5 across the call to func(a, b) so that it could print it afterwards. (The value of a is saved in memory, presumably). By overwriting r5 we have spoiled that. For a leaf routine like func that calls no others, only registers r0 to r3 can be changed unless we take steps to restore them before returning. When we start to write subroutines that call others, we will be able to exploit this by ourselves keeping values in r4 to r7 that we expect to be preserved y those other subroutines. So there is a kind of contract, imposed by convention not be the hardware, that ensures that subroutines can work together harmoniously. The contract covers the fact that arguments are passed in r0 to r3 and the return address in lr, the result is returned in r0, and r4 to r7 (actually, r8 to r11 too) must be preserved. Subroutines that need more registers than r0 to r3 must take steps to save and restore those other registers that it uses. Subroutines that call others must at least preserve the value of lr they receive so that they can return properly. Mechanisms for all of this will emerge later in the course.

Instruction encodings

At first sight, it seems that the rules for what instructions can be encoded in the Thumb format will be irksome in programming, but in practice they are not a great burden. This is partly because the statistically commonest instructions tend to have an encoding, thanks to the skill of the ARM designers, but also because a lot of our work with assembly language programs is in reading them and understanding what they do, rather than writing them by hand. A compiler can easily be made to follow the rules blindly and produce legal and efficient programs.

In detail, additions involving the low registers r0 to r7 come in several encodings. First, there's the three-register version

adds r0, r1, r2

that is encoded like this:

Adds-rrr format.png

If the first two registers mentioned in the instruction are the same, as in adds r0, r0, r1, then the assembler lets us write the instruction as adds r0, r1 instead, but that doesn't affect the binary code that it generates, which is the same in both cases.

For adding a constant to the contents of a register, there are two ways of encoding the instruction. One encoding allows the result to be put in a different register from the operand.

Adds-rri format.png

Let's call that Form A. As you can see, there are only three bits to encode the constant, so the allowable range is from 0 to 7. Form B requires the source and destination registers to be the same, but has space for an 8-bit constant with a value from 0 to 255.

Adds-ri format.png

The assembler lets us write the same instruction either as adds r0, #1 or as adds r0, r0, #1, producing the same binary code in either case. If you look back at your own programs, you will probably find that by far the most common form of addition is the statement

count = count + 1;

and if count lives in a register, this is easily encoded using either Form A or Form B; the assembler happens to use Form B in this case.

Subtract instructions subs between low registers can be encoded in exactly the same variety of ways as adds instructions. As regards the other instructions we have used, move instructions have two variations, depending on whether the source is a register or a constant. There is a register-to-register form:

Movs-rr format.png

Actually, this turns out to be a special case of the lsls instruction that shifts the register contents left by a number of bits, with the number being zero, so the instruction movs r0, r1 can also be written lsls r0, r1, #0. We will have more to say about shift instructions later.

The form of movs that puts a constant in a register allows any constant from 0 to 255 in an 8-bit field.

Movs-ri format.png

For many programs, we will need constants that don't fit in 8 bits; they require a special treatment that will be introduced later.

Unlike adds and subs, the multiply instruction muls exists only in a form where it operates between registers, and requires the output register to be the same as one of the input registers.

Muls-rr format.png

Assembly language syntax

As well as the two instructions adds r0, r0, r1 and bx lr, the assembly language file func.s contains several other elements. These can be copied without much change into most other assembly language programs we write, but for completeness I will explain them here.

All the lines that begin with an at-sign @ are comments, ignored by the assembler in translating the program. The last part of a line containing another element can also be a comment prefixed by @: this is helpful because most lines of assembly language deserve a comment explaining what is going on in the program.

Other lines contain directives that affect the way the assembler translates the program.

    .syntax unified

This tells the assembler that the remainder of the file is written using the most recent, 'unified', conventions for ARM code, where instructions are written the same way whether they are translated into Thumb code or native ARM code – and not the earlier, 'divided', conventions that were different for different chips.

    .global func

This makes the label func accessible to other modules in the program: it's necessary so that the main program that prompts for inputs and prints the results can call our subroutine.


This causes the code that follows to be placed in the text segment of the program, which the linker will later load into the Flash memory of the micro:bit, as is appropriate for the instructions that make up the program.


These two lines begin a subroutine named func that will be assembled into Thumb code. The .thumb_func directive labels the subroutine as being in Thumb code, and this sometimes affects the way it is called. Usually, the directive can be omitted, but for those few times when it's needed, it's simplest to include it always. The instructions that make up the subroutine func follow.


What is the file /usr/lib/gcc/arm-none-eabi/5.4.1/armv6-m/libgcc.a that is mentioned when a program is linked?

It's a library archive (extension .a) containing subroutines that may be called from code compiled with gcc. For example, the program in Lab 1 contains C a subroutine (xtoa in lib.c) for converting numbers from binary to decimal or hexadecimal for printing, and that subroutine contains the statement x = x / base. Since the chip has no divide instruction, gcc translates the statement into a call to a subroutine with the obscure name __aeabi_uidiv, and that subroutine is provided in the library archive. Exercise 1.6 asks you to write a similar subroutine for yourself.

If you're prepared to use that integer division subroutine from the gcc library, why not use the library version of printf too, instead of writing your own?

Using the library version of printf pulls in a lot of other stuff – for example, since the standard printf can format floating-point numbers, including it also drags in lots of subroutines that implement floating-point arithmetic in software. That's OK, but (a) I wanted to keep our programs small for simplicity, and (b) although we are not likely to fill up the 256kB of code space on the nRF51822, on other embedded platforms it's wise to keep a close eye on the amount of library code that is included in the program but not really used. Added to that, the version of printf in the standard library can call malloc, and we don't want that.

Lecture 3