Lecture 1 – Microcontrollers and embedded programming (Digital Systems)

From Spivey's Corner
Jump to: navigation, search


[1.1] In this course, we'll be learning about computer hardware and low-level software. This term, we will start with machine-code programming, learning only enough about the hardware to understand how to program it: then we'll work our way up to programming in a high level language, and using a very simple operating system. Next term, we will dig further down and begin with logic gates and transistors, working our way upwards again until we reach the level of machine instructions and complete the circle. Anything below the level of transistors is magic to us.

This term and next term

The programming we will do, and the machines we will use, are typical of embedded applications, where a computer is used as part of some other product, perhaps without the owner even knowing there is a computer inside. Any device you own that has any kind of display, or buttons to push, probably has a microcontroller inside, rather than any kind of specific digital electronics, and the reason is simple. Microcontrollers – that is, microprocessors with some ROM and RAM integerated on the chip – can be bought today for 50p each or less,[1] and it's much easier to design a board with a microcontroller on it and program it later than to design and build a custom logic circuit.

micro:bit system architecture

[1.2] We will program the BBC micro:bit, a small printed circuit board that contains an ARM-based microcontroller and some other components.

  • As well as the microcontroller, the board has a magnetometer and an accelerometer, connected via an "inter-integrated circuit" (I2C) bus.
  • The board has some LEDs and buttons that provide inputs and outputs.
  • There's also a second microcontroller on board that looks after the USB connection, providing a programming and debugging interface, and allowing access over USB to the serial port (UART) of the main microcontroller.

[1.3] The microcontroller is a Nordic Semiconductor nRF51822.

  • It contains an ARM-based processor core.
  • Also some on-chip RAM (16kB!!!) and ROM (256kB), with the ROM programmable from a host computer.
  • The chip has peripheral devices like GPIO (for the buttons and lights), an I2C interface, and a UART.
  • The chief selling point of the Nordic chips is that they also integrate the radio electronics for Bluetooth, but we probably won't be using that.

The processor core is (almost) the tiniest ARM, a Cortex-M0.

  • It has the same set of 16 registers as any other ARM32, and a datapath that implements the same operations.
  • The instruction set (Thumb) provides a compact 16-bit encoding of the most common instructions.
  • There is an interrupt controller (NVIC) that lets the processor respond to external events.
  • Because of the needs of Bluetooth, the datapath has the snazzy single-cycle multiplier option.

The micro:bit comes with a software stack that is based on the MBED platform supported by ARM, with subroutine libraries that hide the details of programming the hardware I/O devices. On top of that, programmers at Lancaster University have provided a library of drivers for the peripheral devices present on the board, and that provides the basis for various languages that are used for programming in schools, including MicroPython, JavaScript, and something like Scratch. We will ignore all that and program the board ourselves, starting from the bare metal.


Microcontrollers with a 32-bit core are becoming more common, but the market is still dominated by 8-bit alternatives such as the AVR (as used in Arduino) and the PIC – both electrically more robust, but significantly less pleasant to program. As we'll see, the embedded ARM chips have an architecture that is simple, regular, and well suited as a target for compilers. Other 32-bit microcontroller families also exist, such as the PIC-32 family based on the MIPS architecture. One level up are the "System on a chip" products like the ARM processors used in mobile phones and the Raspberry Pi, and the MIPS-based devices that are commonly used in Wifi routers: these typically have hundreds of megabytes of RAM, and run a (possibly slimmed-down) version of a full operating system. By way of contrast, the operating system we shall develop for the micro:bit is really tiny.

Programming the micro:bit[edit]

Cortex-M0 registers

[1.4] To start programming the micro:bit at the machine level, we will need to know what registers it has, because registers hold the inputs and outputs of each instruction. Like every 32-bit ARM (including the bigger ARM chips used on the Raspberry Pi), the micro:bit has 16 registers. According to the conventions used on Cortex-M0, register r0 to r7 are general-purpose, and can be used for any kind of value. The group r0 to r3 play a slightly different rôle in subroutine calls from the group r4 to r7, but we'll come to that later.

Registers r8 to r12 are not used much, because as we'll see, the instruction set makes it hard to access them. I don't think we'll use them at all in the programs we write. The remaining three registers all have special jobs:

  • sp is the stack pointer, and contains the address of the lowest occupied memory cell on the subroutine stack.
  • lr is the link register, and contains the address where the current subroutine will return.
  • pc is the program counter, and contains the address of the next instruction to be executed.

There's a seventeenth register psr, also with a special purpose, in fact several purposes attached to different bits. Four of the bits, the comparison flags N, Z, C and V, influence the outcome of conditional branch instructions in a way that we shall study soon. Other bits have meanings that become important when we think of using interrupt-based control or writing an operating system: more of that later.

[1.5] We can begin to see how these registers are used by considering a program with just one instruction. Suppose that the 16-bit memory location at address 192 contains the bit pattern 0001 1000 0100 0000. We could write that in hexadecimal[2] as 0x1840. In other contexts, it might represent the integer 6208, or an orange colour so dark as to be almost black. But here it represents an instruction that would be written in assembly language as adds r0, r0, r1:

0001100 001 000 000
adds    r1  r0  r0

(The fact that the bit-fields of the instruction appear in a different order from the register names in the written instruction doesn't at all matter, provided the assembler program and the hardware agree on the ordering.) When this instruction is executed, the machine adds together the integer quantities that it finds in registers r0 and r1, and puts the result into register r0, replacing whatever value was stored there before.

For example, if the machine is in a state where

 pc = 192     r0 = 23     r1 = 34     r2 = 96     ...     lr = 661     nzcv = 0010

then the next state of the machine will be

 pc = 194     r0 = 57     r1 = 34     r2 = 96     ...     lr = 661     nzcv = 0000

and incidentally (that's the meaning of the s in adds) the NZCV flags have all been set to 0. As you can see, the addition of r0 and r1 has been done, and the result is in r0. Also, the program counter has been increased by 2, because that is the size in bytes of the instruction. (On the Cortex-M0, nearly all instructions are 2 bytes long).

[1.6] As it happens, the next instruction at address 194 is 0x4770, and that decodes as bx lr, an instruction that reloads the program counter pc with the value held in lr, which is the address of the next instruction after the one that called this subroutine.

 pc = 660     r0 = 57     r1 = 34     r2 = 96     ...     lr = 661     nzcv = 0000

A detail: for compatibility with the larger processors, the lr register records in its least significant bit the fact that the processor is in Thumb mode (16 bit instructions): the value is 661 = 0x295 = 0000001010010101. This bit is cleared when the value is loaded into the pc, and the address of the next instruction executed is 0x294; if the 1 bit is not present in lr, then an exception occurs – with the lab programs, the result is the Seven Stars of Death.

At address 0x294 in our program is code compiled from C that outputs the result of the addition over the serial port.

Decoding chart for Thumb instructions

[1.7] You can decode these numeric instructions by consulting the Architecture Reference Manual for the Cortex-M0. As a reminder, I like to keep by me a handy chart (click the preview above) showing all the instruction encodings on one page. You can find the adds instruction in orange on chart [A], and the bx instruction in blue on chart [B].

Instruction decoding

[1.8] On bigger chips in the ARM family (like the ARM11 that is used in the Raspberry Pi), ordinary instructions are 32 bits long, and the more compact 16 bit instructions that are used on the micro:bit are a seldom-used option: why worry about code size when you have 1GB of memory? The 32-bit instruction set is very regular, and lets you express instructions that add or subtract or multiply numbers from any two of the 16 registers and put the result in a third one. This makes the instruction set easy to understand, and the regularity makes the process of writing a compiler easier too. The 16-bit instructions are more restrictive because, as shown above, the fields in an adds instruction for specifying registers are only 3 bits long, and that makes only registers r0 to r7 usable.

As the diagram shows, the bigger chips have two instruction decoders, one for 32-bit native instructions and another for 16-bit Thumb instructions, and there is a multiplexer (the long oval shape) controlled by the mode bit, actually a bit in the psr register, to choose which is used. Both decoders produce the same signals for controlling the datapath that actually executes the instructions, so that the instructions that are expressible in the Thumb encoding agree exactly in their effect with certain 32-bit instructions.

On the micro:bit, the 32-bit decoder is missing, so all instructions must be expressed in the 16-bit encoding, and the mode bit must always be 1. It might be better to say there was no mode bit at all, except for the fact that you can try to set it to 0, and that causes a crash and (with our lab software) shows the Seven Stars of Death on the LEDs.


The Cortex-M0 shares some attributes typical of RISC machines:
  • The large(-ish) set of uniform registers. By way of contrast, early microprocessors had only very few registers, and each had a specific purpose: only the A register could be used for arithmetic, only the X register used for indexed addressing, etc.
  • Arithmetic instructions that operate between registers. The most common instructions (add, sub, cmp) can operate with any two source registers, and write their result (if any) in a third register. There is limited support for embedding small constant operands in the instruction, and separate provision of some kind for loading an arbitary constant into a register.
  • Access to memory uses separate load and store instructions, supporting a limited set of simple addressing modes: in this case, register plus register, and register plus small constant.

The most unusual feature of the ARM ISA is that execution of every instruction – not just branches – can be made dependent on the condition codes. The Thumb encoding gets rid of this, allowing only branches to be conditional.

The simplicity and uniformity that is typical of RISC instruction sets is compromised a bit by the complexity of the Thumb encoding. In theory, the absolute uniformity of a RISC instruction set makes things easy for a compiler; in practice, the restrictions of the Thumb encoding make little difference, because most common operations have a compact encoding, and the restrictions are easy enough to incorporate in a compiler that generates code by following a set of explicit rules.


How does the control unit know to interpret the last nine bits of an adds instruction 0x1840 as the names of three registers?

The leading few bits of the instruction, up to 10 of them, determine which instruction encoding applies – adds ra, rb, rc in this case, determined by the leading seven bits. This determines the interpretation of the rest of the instruction. In this case the remaining nine bits are fed by the control unit to the datapath in three groups of three so as to determine which registers take part in the addition.

0001100 001 000 000     adds r0, r0, r1
adds    r1  r0  r0

There's another instruction encoding where a three-bit constant is fed to the datapath as the second operand instead of a register.

0001110 011 001 000     adds r0, r1, #3
adds    3   r1  r0

The control unit behaves differently in the two cases because the opcode is different. The colourful decoding chart enables us to deduce what instruction encoding applies to any string of 16 bits, but it doesn't specify exactly how the remaining bits are interpreted: in this case, it tells us that nine bits of the instruction are three register names or two registers and a three-bit constant, but not what order they appear. For that, you need to look at the appropriate page in the Architecture Reference Manual.

When programming, I mostly find the chart useful as a reminder of what encodings exist: for example, the instructions

adds r0, r1, #3


adds r0, r0, #27

are legal, but

adds r0, r1, #27

is not, because there is no way to encode it.

Lecture 2

  1. https://uk.farnell.com/stmicroelectronics/stm32f030f4p6tr/mcu-32bit-cortex-m0-48mhz-tssop/dp/2432084
  2. In this course, I'll always write hexadecimal constants using the C notation 0x...

A single integrated circuit that contains a microprocessor together with some memory (usually both RAM for dynamic state and ROM for storing a persistent program) and peripheral interfaces.

(Read-Only Memory). A form of storage whose contents are non-volatile (are not lost when the power is off) but cannot be changed under program control. Modern ROM is usually EEPROM – Electrically Erasable Programmable Read Only Memory, and can be changed electrically, and even under control of a program running on the microcontroller, but using special peripheral registers and not the normal store instructions. Flash memory is a modern, super-compact implementation of EEPROM, but for our purposes it does exactly the same job. We will modify the contents of the micro:bit's flash memory by downloading programs, but we will probably not be writing programs that change the contents of the flash memory.

(Universal Asynchronous Receiver/Transmitter). A peripheral interface that is able to send and receive characters serially, commonly used in the past for communication between a computer and an attached terminal. It is commonly used in duplex mode, with the transmitter of one device connected to the receiver of the other with one wire, and the receiver of the one connected to transmitter of the other with a different wire. The asynchronous part of the name refers to the fact that the transmitter and receiver on each wire do not share a common clock, but rely instead on the signalling protocol and precise timing to achieve synchronisation.

(General-Purpose Input/Output). A peripheral interface that provides direct access to pins of the microcontroller chip. Pins may be configured as inputs or outputs, and interrupts may be associated with state changes on certain input pins. On the micro:bit, the LEDs and pushbuttons are connected to GPIO pins.

An alternative instruction encoding for the ARM in which each instruction is encoded in 16 rather than 32 bits. The advantage is compact code, the disadvantage that only a selection of instructions can be encoded, and only the first 8 registers are easily accessible. In Cortex-M microcontrollers, the Thumb encoding is the only one provided.

(Nested Vector Interrupt Controller). An ARM processor component that is able to assign priorities to external interrupts (as opposed to those generated by internally by the processor) and control the servicing of interrupts. As the name indicates, it is able to cope with nested interrupts, where servicing of one interrupt is itself interrupted by another with higher priority. Note that, according to ARM conventions, higher priorities are indicated by lower numbers.

A register sp that holds the address of the most recent occupied word of the subroutine stack. On ARM, as on most recent processors, the subroutine stack grows downwards, so that the sp holds the lowest address of any occupied work on the stack.

On ARM processors, a register (r14) in which the program counter value is saved by the instructions bl and blx that call a subroutine. The subroutine can return by branching to this address with the instruction bx lr, or can save the value on the stack (with push {..., lr}) and later return by restoring the same value back into the program counter (with pop {..., pc}).

A register that contains the address of the next instruction to be executed. Because of pipelining, on ARM Cortex-M machines, reading the program counter yields a value that is 4 bytes greater than the address of the current instruction.

A symbolic representation of the machine code for a program.

Four bits, N, Z, V and C, in the processor status word that indicate the result of a comparison or other arithmetic operation. Briefly, N indicates whether the result of the operation was negative, Z indicates whether it was zero, C is the value of the carry-out bit from the ALU, and V indicates whether the operation overflowed, yielding a result that was different in sign from what could be predicted from the inputs to the operation. A comparison is treated like a subtraction as far as setting the condition codes is concerned. After the condition codes have been set, a subsequent conditional branch instruction can test them, and make a branch decision based on a boolean combination of their values. All ten arithmetic comparisons (equal, not-equal, and less-than, less-than-or-equal, greater-than, and greater-than-or-equal for both signed and unsigned representations) can be represented in this way. When a process is interrupted, the condition codes must be saved and restored as part of the processor state, in case the interrupt came between a comparison and a subsequent conditional branch.