Lecture 1 – Microcontrollers and embedded programming (Digital Systems)
Introduction
[1.1] In this course, we'll be learning about computer hardware and low-level software. This term, we will start with machine-code programming, learning only enough about the hardware to understand how to program it: then we'll work our way up to programming in a high level language, and using a very simple operating system. Next term, we will dig further down and begin with logic gates and transistors, working our way upwards again until we reach the level of machine instructions and complete the circle. Anything below the level of transistors is magic to us.
The programming we will do, and the machines we will use, are typical of embedded applications, where a computer is used as part of some other product, perhaps without the owner even knowing there is a computer inside. Any device you own that has any kind of display, or buttons to push, probably has a microcontroller inside, rather than any kind of specific digital electronics, and the reason is simple. Microcontrollers – that is, microprocessors with some ROM and RAM integerated on the chip – can be bought today for 50p each or less,[1] and it's much easier to design a board with a microcontroller on it and program it later than to design and build a custom logic circuit.
[1.2] We will program the BBC micro:bit, a small printed circuit board that contains an ARM-based microcontroller and some other components.
- As well as the microcontroller, the board has a magnetometer and an accelerometer, connected via an "inter-integrated circuit" (I2C) bus.
- The board has some LEDs and buttons that provide inputs and outputs.
- There's also a second microcontroller on board that looks after the USB connection, providing a programming and debugging interface, and allowing access over USB to the serial port (UART) of the main microcontroller.
[1.3] The microcontroller is a Nordic Semiconductor nRF51822.
- It contains an ARM-based processor core.
- Also some on-chip RAM (16kB!!!) and ROM (256kB), with the ROM programmable from a host computer.
- The chip has peripheral devices like GPIO (for the buttons and lights), an I2C interface, and a UART.
- The chief selling point of the Nordic chips is that they also integrate the radio electronics for Bluetooth, but we probably won't be using that.
The processor core is (almost) the tiniest ARM, a Cortex-M0.
- It has the same set of 16 registers as any other ARM32, and a datapath that implements the same operations.
- The instruction set (Thumb) provides a compact 16-bit encoding of the most common instructions.
- There is an interrupt controller (NVIC) that lets the processor respond to external events.
- Because of the needs of Bluetooth, the datapath has the snazzy single-cycle multiplier option.
The micro:bit comes with a software stack that is based on the MBED platform supported by ARM, with subroutine libraries that hide the details of programming the hardware I/O devices. On top of that, programmers at Lancaster University have provided a library of drivers for the peripheral devices present on the board, and that provides the basis for various languages that are used for programming in schools, including MicroPython, JavaScript, and something like Scratch. We will ignore all that and program the board ourselves, starting from the bare metal.
Context
Microcontrollers with a 32-bit core are becoming more common, but the market is still dominated by 8-bit alternatives such as the AVR (as used in Arduino) and the PIC – both electrically more robust, but significantly less pleasant to program. As we'll see, the embedded ARM chips have an architecture that is simple, regular, and well suited as a target for compilers. Other 32-bit microcontroller families also exist, such as the PIC-32 family based on the MIPS architecture. One level up are the "System on a chip" products like the ARM processors used in mobile phones and the Raspberry Pi, and the MIPS-based devices that are commonly used in Wifi routers: these typically have hundreds of megabytes of RAM, and run a (possibly slimmed-down) version of a full operating system. By way of contrast, the operating system we shall develop for the micro:bit is really tiny.Programming the micro:bit
[1.4] To start programming the micro:bit at the machine level, we will need to know what registers it has, because registers hold the inputs and outputs of each instruction. Like every 32-bit ARM (including the bigger ARM chips used on the Raspberry Pi), the micro:bit has 16 registers. According to the conventions used on Cortex-M0, register r0
to r7
are general-purpose, and can be used for any kind of value. The group r0
to r3
play a slightly different rôle in subroutine calls from the group r4
to r7
, but we'll come to that later.
Registers r8
to r12
are not used much, because as we'll see, the instruction set makes it hard to access them. I don't think we'll use them at all in the programs we write. The remaining three registers all have special jobs:
sp
is the stack pointer, and contains the address of the lowest occupied memory cell on the subroutine stack.lr
is the link register, and contains the address where the current subroutine will return.pc
is the program counter, and contains the address of the next instruction to be executed.
There's a seventeenth register psr
, also with a special purpose, in fact several purposes attached to different bits. Four of the bits, the comparison flags N
, Z
, C
and V
, influence the outcome of conditional branch instructions in a way that we shall study soon. Other bits have meanings that become important when we think of using interrupt-based control or writing an operating system: more of that later.
[1.5] We can begin to see how these registers are used by considering a program with just one instruction. Suppose that the 16-bit memory location at address 192 contains the bit pattern 0001 1000 0100 0000. We could write that in hexadecimal[2] as 0x1840. In other contexts, it might represent the integer 6208, or an orange colour so dark as to be almost black. But here it represents an instruction that would be written in assembly language as adds r0, r0, r1
:
0001100 001 000 000 adds r1 r0 r0
(The fact that the bit-fields of the instruction appear in a different order from the register names in the written instruction doesn't at all matter, provided the assembler program and the hardware agree on the ordering.) When this instruction is executed, the machine adds together the integer quantities that it finds in registers r0
and r1
, and puts the result into register r0
, replacing whatever value was stored there before.
For example, if the machine is in a state where
pc = 192 r0 = 23 r1 = 34 r2 = 96 ... lr = 661 nzcv = 0010
then the next state of the machine will be
pc = 194 r0 = 57 r1 = 34 r2 = 96 ... lr = 661 nzcv = 0000
and incidentally (that's the meaning of the s
in adds
) the NZCV
flags have all been set to 0. As you can see, the addition of r0
and r1
has been done, and the result is in r0
. Also, the program counter has been increased by 2, because that is the size in bytes of the instruction. (On the Cortex-M0, nearly all instructions are 2 bytes long).
[1.6] As it happens, the next instruction at address 194 is 0x4770, and that decodes as bx lr
, an instruction that reloads the program counter pc
with the value held in lr
, which is the address of the next instruction after the one that called this subroutine.
pc = 660 r0 = 57 r1 = 34 r2 = 96 ... lr = 661 nzcv = 0000
A detail: for compatibility with the larger processors, the lr
register records in its least significant bit the fact that the processor is in Thumb mode (16 bit instructions): the value is 661 = 0x295 = 0000001010010101. This bit is cleared when the value is loaded into the pc
, and the address of the next instruction executed is 0x294; if the 1 bit is not present in lr
, then an exception occurs – with the lab programs, the result is the Seven Stars of Death.
At address 0x294 in our program is code compiled from C that outputs the result of the addition over the serial port.
[1.7] You can decode these numeric instructions by consulting the Architecture Reference Manual for the Cortex-M0. As a reminder, I like to keep by me a handy chart (click the preview above) showing all the instruction encodings on one page. You can find the adds
instruction in orange on chart [A], and the bx
instruction in blue on chart [B].
[1.8] On bigger chips in the ARM family (like the ARM11 that is used in the Raspberry Pi), ordinary instructions are 32 bits long, and the more compact 16 bit instructions that are used on the micro:bit are a seldom-used option: why worry about code size when you have 1GB of memory? The 32-bit instruction set is very regular, and lets you express instructions that add or subtract or multiply numbers from any two of the 16 registers and put the result in a third one. This makes the instruction set easy to understand, and the regularity makes the process of writing a compiler easier too. The 16-bit instructions are more restrictive because, as shown above, the fields in an adds
instruction for specifying registers are only 3 bits long, and that makes only registers r0
to r7
usable.
As the diagram shows, the bigger chips have two instruction decoders, one for 32-bit native instructions and another for 16-bit Thumb instructions, and there is a multiplexer (the long oval shape) controlled by the mode
bit, actually a bit in the psr
register, to choose which is used. Both decoders produce the same signals for controlling the datapath that actually executes the instructions, so that the instructions that are expressible in the Thumb encoding agree exactly in their effect with certain 32-bit instructions.
On the micro:bit, the 32-bit decoder is missing, so all instructions must be expressed in the 16-bit encoding, and the mode
bit must always be 1. It might be better to say there was no mode
bit at all, except for the fact that you can try to set it to 0, and that causes a crash and (with our lab software) shows the Seven Stars of Death on the LEDs.
Context
The Cortex-M0 shares some attributes typical of RISC machines:- The large(-ish) set of uniform registers. By way of contrast, early microprocessors had only very few registers, and each had a specific purpose: only the
A
register could be used for arithmetic, only theX
register used for indexed addressing, etc. - Arithmetic instructions that operate between registers. The most common instructions (
add
,sub
,cmp
) can operate with any two source registers, and write their result (if any) in a third register. There is limited support for embedding small constant operands in the instruction, and separate provision of some kind for loading an arbitary constant into a register. - Access to memory uses separate load and store instructions, supporting a limited set of simple addressing modes: in this case, register plus register, and register plus small constant.
The most unusual feature of the ARM ISA is that execution of every instruction – not just branches – can be made dependent on the condition codes. The Thumb encoding gets rid of this, allowing only branches to be conditional.
The simplicity and uniformity that is typical of RISC instruction sets is compromised a bit by the complexity of the Thumb encoding. In theory, the absolute uniformity of a RISC instruction set makes things easy for a compiler; in practice, the restrictions of the Thumb encoding make little difference, because most common operations have a compact encoding, and the restrictions are easy enough to incorporate in a compiler that generates code by following a set of explicit rules.In the next few lectures, we will explore programming the micro:bit at the machine level, slowly increasing the range of machine features our programs will use:
- first, we'll use instructions that ask for arithmetic and logical operations on values that are stored in the registers of the machine, using them in straight-line programs.
- then, we'll add branch instructions that let us express conditional execution and loops, so that programs can express more than a single, limited chain of calculations.
- next, we'll introduce subroutines, or rather, we'll write assembly language programs with several subroutines, with one able to call another.
- finally, we add the ability to load and store values in the random-access memory of the machine, so that programs can work on more data than will fit in the small set of registers.
Two points about this sequence of tours of the machine:
- First, we're not studying these things because of any need to use assembly language to write actual programs. In the olden days, compilers were weaker, and sometimes programmers would use assembly language to speed up small but vital parts of their programs. Those days are past, and now compilers are able to generate perfectly good code without our help – in fact, code that is often better than we could write by hand, because they are able to deal with tedious details that would overwhelm our patience. Also, it used to be that some assembly language was needed as an interface between the hardware and code compiled from a high-level language, or perhaps to initialise the machine when it first woke up. Those days are gone too, and in Lab zero, our very first program will be written entirely in C, with no assembly-language parts at all. It's a strength of the ARM design that this is possible.
- The second point: we are not going to start programming from scratch. In the good old days, you could sit at the console of the machine, toggle in a program in binary, then execute it step by step, stopping between each instruction and the next to examine the contents of registers. Those days are long gone, alas, and what we will do instead is use a main program written in C to feed values to an assembly-language subroutine (written by us) and print the results it returns. We won't study until much later the techniques that main program uses to do input and output, but will just take for granted that it does its job. You have to start somewhere!
Questions
How does the processor know how many instruction bits to use as the opcode?
This answer must have two parts: (i) what algorithm can be used to decode an instruction? and (ii) how can the algorithm be implemented as an electronic circuit? I'll leave part (ii) for next term: for now it's enough to see that there's a pencil-and-paper algorithm that solves the problem. And that's simple – use the rainbow chart. Look up the first seven bits of the instruction in table [A]. If the result is a coloured region, that is the instruction, and what instruction it is determines the detailed way the rest of the bits are interpreted. (If the coloured region is more than one cell, then some of the first seven bits will contribute to the detail too.) If table [A] gives a reference to another table, then take a few more bits of the instruction and look them up in that table.
Note that there are several entries in the rainbow chart that are labelled with a mnemonic like adds
, and they're distinguished by my invented notations adds r
(meaning add between registers), adds i3
(meaning add with a 3-bit immediate field), etc. All are generated from assembly language instructions beginning adds
, and the assembler knows how to choose the encoding based on the pattern of operands. Each of these variants (more or less) has its own page in the ARM Architecture Reference Manual.
How does the control unit know to interpret the last nine bits of an adds
instruction 0x1840 as the names of three registers?
The leading few bits of the instruction, up to 10 of them, determine which instruction encoding applies – adds ra, rb, rc
in this case, determined by the leading seven bits. This determines the interpretation of the rest of the instruction. In this case the remaining nine bits are fed by the control unit to the datapath in three groups of three so as to determine which registers take part in the addition.
0001100 001 000 000 adds r0, r0, r1 adds r1 r0 r0
There's another instruction encoding where a three-bit constant is fed to the datapath as the second operand instead of a register.
0001110 011 001 000 adds r0, r1, #3 adds 3 r1 r0
The control unit behaves differently in the two cases because the opcode is different. The colourful decoding chart enables us to deduce what instruction encoding applies to any string of 16 bits, but it doesn't specify exactly how the remaining bits are interpreted: in this case, it tells us that nine bits of the instruction are three register names or two registers and a three-bit constant, but not what order they appear. For that, you need to look at the appropriate page in the Architecture Reference Manual.
When programming, I mostly find the chart useful as a reminder of what encodings exist: for example, the instructions
adds r0, r1, #3
and
adds r0, r0, #27
are legal, but
adds r0, r1, #27
is not, because there is no way to encode it, and you have to write movs r0, r1; adds r0, r0, #27
instead.
- ↑ https://uk.farnell.com/stmicroelectronics/stm32f030f4p6tr/mcu-32bit-cortex-m0-48mhz-tssop/dp/2432084
- ↑ In this course, I'll always write hexadecimal constants using the C notation 0x...