Lab zero – getting started

Follow the instructions for building, loading and running a simple program that echoes input sent over the serial port

The purpose of this lab exercise is to get started with using the micro:bit and the software toolchain that supports building programs for it. The actions needed to prepare each program are spelled out in a Makefile, which can be interpreted by the unix program make to build the program automatically. You can do this from the unix command line, and upload the resulting binary program by copying it with a shell command to a virtual disk drive that represent's the micro:bit's memory. You can get started with programming the micro:bit by building and running a simple program that lets you connect to the micro:bit over a serial interface, then echoes the characters that you type. Like all the programs we will work with, this one depends on no machine-specific library code, so all the details of how the machine is programmed are explicit.

To carry out the instructions, you will need:

A BBC micro:bit, which we will supply.
A USB cable, with a full-size 'Type A' plug on one end and a 'Micro B' plug on the other. This we ask you to supply, because you probably have several spare ones already – they are the kind often used with mobile phones. The cable doesn't need to be very long, but it does need both power and data wires in it. Some cables (often the ones supplied for charging bike lights, in my experience) have power wires only, and they are useless for our purposes. Our micro:bits will be powered over USB, and will also communicate with the host computer over USB for downloading programs, for sending and receiving charaters on the serial port, and for connection with debugging software running on the host.

If you already have a preferred editor, especially one that is able to build the program being edited by invoking make, then you can use it to edit the provided programs and make micro:bit programs of your own. I developed the programs using Emacs myself, which can invoke make in an editor window, parse any resulting error messages, and show the lines of source code where the errors occurred. Other editors can do the same, but if you find yourself reading the error messages from the C compiler and counting lines in the source file, then you are using precious brain cells to do something a machine can do better.

If you don't already have an editor you like, then I suggest using Geany for this course. Geany is a simple, open-source editor and IDE that comes with most Linux distributions and is installed by default on the Raspberry Pi. I've prepared a version of Geany that can understand the syntax of ARM assembly language, and set up project files for each lab that contain the commands needed to build the programs (using make behind the scenes), to upload them to the micro:bit, and if needed to start the GDB debugger. Things on the lab machines will be set up so that you just need to double-click on one of these project files in the file manager to open the project in the Geany editor.

If you are using the lab machines, then the toolchain should already be installed for you; otherwise, there's a page with installation instructions for Linux systems.

Getting the sources

You should begin by making a copy of the Mercurial repository containing the lab materials.

$ hg clone https://spivey.oriel.ox.ac.uk/hg/digisys

The repository contains various, independent subdirectories for multiple labs, and the materials for this lab are in the lab0-echo subdirectory. You can browse the repository contents by opening https://spivey.oriel.ox.ac.uk/hg/digisys in a web browser.

We won't be making much use in this course of the power of Mercurial to track changes across multiple versions and multiple files and directories, but in future courses like Compilers you will be required to make modifications, check them in to a version control system, and submit a report on your changes. I'm recommending Mercurial for version control because it's easier to learn than Git and just as powerful – I use it for all my own work. If you want to know more about it, there's a nice tutoral online written by Joel Spolsky.

It seems necessary in today's world to believe one or other of the propositions, "Git is better than Mercurial," or "Mercurial is better than Git." If you are already a Git afficionado, then you can use instead the command,

$ git clone https://spivey.oriel.ox.ac.uk/git/digisys.git

Having cloned the repository, if you want to use Geany then you will need to configure it for editing ARM assembly language, and you will need to generate Geany project files for each lab. (These project files are machine-specific and not suitable for checking in to version control.) To do this, change to the digisys directory and invoke the shell script setup/genproj:

$ cd digisys
$ setup/install

This permanently installs settings under $HOME/.local/share and $HOME/.config/geany, and creates files lab0-echo/lab0.geany, etc. in the directory for each lab. The command needs to be run only once, not once for each session.

The source code you need for this lab exercise is in the subdirectory lab0-echo of the course materials. The following files are provided.

`Makefile`	Build script
`echo.c`	Program source
`startup.c`	Startup code
`hardware.h`	Header file with layout of I/O registers
`nRF51822.ld`	Linker script
`lab0.geany`	Geany project file
`debug`	Shell script for starting debugger

Remarkably for an embedded program, all the code is in a high-level language, and there is no assembly-language code. That this is possible is a nice feature of the Cortex-M platform.

Compiling the program

Once you have obtained a copy of these files, all you should need to do to build the program from the command line is to change to the lab0-echo subdirectory and give the command make:

$ cd lab0-echo
$ make

The individual commands shown below will be executed automatically. I will spell them out so that you know what they do, but after you have built a few programs, you will no doubt be content to let them whizz by without paying much attention – unless the build grinds to a halt with an error message, that is.

The first command to be executed is this:

arm-none-eabi-gcc -mcpu=cortex-m0 -mthumb -O -g -Wall -ffreestanding -c echo.c -o echo.o

It uses a C compiler, arm-none-eabi-gcc, to translate the source file echo.c into object code in the file echo.o. This compiler is a cross compiler, running on an Intel machine, but generating code for an embedded ARM chip: the none in its name indicates that the code will run with no underlying operating system. The flags given on the command line determine details of the translation process.

-mcpu=cortex-m0 -mthumb: generate code using the Thumb instruction set supported by the ARM model present on the micro:bit.
-O: optimise the object code a little.
-g: include debugging information in the output.
-Wall: warn about all dubious C constructs found in the program.
-ffreestanding: this program is self-contained, so don't make some common assumptions about its environment.
-c: compile the C code into binary machine language, but don't put together an executable image.
-o echo.o: put the binary code in the file echo.o.

Next, make also compiles the file startup.c in a similar way.

arm-none-eabi-gcc -mcpu=cortex-m0 -mthumb -O -g  -Wall -ffreestanding -c startup.c -o startup.o

The file startup.c contains the very first code that runs when the microcontroller starts. It is written in C, but it uses several non-portable constructions, and few of the usual assumptions about the behaviour of C programs apply.

With both of the source files translated into object code, it is now time to link them together, forming a file echo.elf that contains the complete, binary form of the program. This is done by invoking the C compiler again, but this time providing the two file echo.o and startup.o as inputs.

arm-none-eabi-gcc -mcpu=cortex-m0 -mthumb -O -g -Wall -ffreestanding  -T nRF51822.ld -nostdlib \
    echo.o startup.o -lgcc -o echo.elf -Wl,-Map,echo.map

Again, the detailed behaviour of this command is determined by the long sequence of flags. The new ones are as follows.

-T nRF51822.ld: use the linker script in the file nRF51822.ld. This script describes the layout of the on-chip memory of the micro:bit: 128K of flash memory at address 0, and 16K of RAM at address 0x20000000. It also says how to lay out the various segments of the program: the executable code 2.text and string constants .rodata in flash, and the initialised data .data and uninitialised globals .bss@ in RAM.
-nostdlib: the usual startup code and libraries for a C program are omitted, because we are supplying our own.
-lgcc: the C compiler's own library is searched for functions (such as out-of-line code for integer division) that the program needs.
-Wl,-Map,echo.map: a map of the layout of storage is written to the file echo.map
-o echo.elf: the output goes into a file echo.elf that has the same format as the .o files prepared earlier, but now contains a complete program.

Many of these flags can be used unchanged in building other programs in the course, and it is good to know why they are there. Faced with the problem of fitting an application into a tiny amount of memory, embedded programmers become intensely interested in storage layouts and linker scripts.

We are nearing the end of the process. The next command just prints out the size of the resulting program.

arm-none-eabi-size echo.elf
  text	   data	    bss	    dec	    hex	filename
  1268	      0	     84	   1352	    548	echo.elf

Here we see that the program has 1268 bytes of code, no initialised storage for data, and 84 bytes of uninitialised space (actually, it is initialised to zero) for global variables. In the echo program, this consists almost entirely of an 80-byte buffer for a line of keyboard input.

The final stage prepares the binary object code in another format, ready to be downloaded to the micro:bit board.

arm-none-eabi-objcopy -O ihex echo.elf echo.hex

The file echo.elf is a binary file, containing the object code and a lot of debugging information, whereas echo.hex is actually a text file, containing just the object code encoded as long hexadecimal strings, a format that the loading mechanism on the micro:bit understands.

Running the program

If you plug in a micro:bit, it will appear as a USB drive on your computer, and you can copy the file echo.hex to it, either by dragging and dropping from the file manager with the mouse, or by using a shell command:

$ cp echo.hex /run/media/mike/MICROBIT/

(with mike replaced your own username, typically in the format u19xyz). The yellow LED on the micro:bit will flash briefly, then the program will start to run. Note that the USB drive appears to have a couple of files on it – one a text file giving the version number of the board and its embedded software, another an HTML file with a link to the micro:bit website. Any files you drag and drop there do not appear as files on the drive, however: they are instantly consumed by the flash loader and not stored as files.

The echo program reads and writes the serial interface of the micro:bit, which appears as a terminal device /dev/ttyACM0 on the Linux machine. To connect with this device, it's convenient to use a program called minicom on the Linux machine. Start a shell window, then type the command,^[1]

$ minicom -D /dev/ttyACM0 -b 9600

After connecting, you should press the reset button on the micro:bit to start the program again. You should see the message Hello micro:world, followed by > as a prompt. Type characters at the prompt: they will be echoed, and you can use the backspace key to make corrections. When you press Return, the line you typed will be repeated, and then a new prompt appears.

Using Geany

The instructions above tell you how to compile, upload and run a micro:bit program from the command line, but all the same actions can be performed from the Geany editor. To open the program as project within Geany, use the file manager to look for the file lab0-echo/lab0.geany and double-click on it. This should launch Geany with the file echo.c initially open, and the Build menu filled with appropriate actions for the project. Specifically, when we choose Build>Make in a moment to compile the program, Geany will use the Makefile provided, and therefore invoke the cross-compiler arm-none-eabi-gcc rather than the native C compiler that is called just plain gcc. (Don't try opening individual files of C code with Geany rather than opening the project, or the setup will be wrong.)

To build and run the program from within Geany:

Choose Build>Make to compile the program. Geany will invoke make, and the same steps will happen as were listed earlier. The commands and any error messages will appear in a separate pane at the bottom of Geany's window, and afterwards Geany will analyse the error messages and highlight corresponding lines in the source file.
Choose Build>Upload to upload the program to a plugged-in micro:bit.
Choose Build>Minicom to launch a new window running minicom to talk to the micro:bit. You can leave this window open as long as you like, or close it when you have finished interacting with the running program.

Whether using Geany or your own choice of editor, you can now go on to try some assembly language programming in Lab one, or you can try connecting to the micro:bit with the symbolic debugger GDB as described in the next section, and come back to using GDB later.

Using a debugger

The USB interface between the host computer and the micro:bit serves three purposes: it enables us to upload programs to the board, it lets us interact with a program running on the board over the micro:bit's serial port, and thirdly it allows a debugger running on the host to monitor and control the execution of the program on the micro:bit.

Here's how to run the program under control of a debugger, so as to execute it step by step. You will use two terminal windows for this experiment – one to connect to the micro:bit using minicom as before, a second one to run the GNU debugger gdb. Instructions are given here to start a debugging session from the command line, but the same effect can be achieved by choosing Build>Minicom and Build>Debug from within Geany. However you start the debugger, you may like to enlarge its window to show more lines, particularly if you want to use the multi-panel interface described later.

1. Plug in the microbit, and open a terminal window to run minicom, as before. Check that you can type characters and have them echoed by the program.

2. Open another terminal window, change to the lab0-echo directory, and run the shell script ./debug echo.elf. This script first starts an adapter program pyocd that can talk to the micro:bit over the USB link, then connects to the adapter with the interactive debugger GDB. You should see some messages along the following lines:

$ ./debug echo.elf
...
0x00000150 in serial_getc () at echo.c:28
28     while (! UART_RXDRDY) { }
(gdb)

The debugger has stopped the program wherever it was: as you might expect, the program is sitting in a tight loop, waiting for a character to be typed on the keyboard. As we'll learn much later, UART_RXDRDY is a register in the serial interface whose value indicates whether a character has been received, and line 28 is a loop that tests the register repeatedly until a character arrives.

Sadly, the debugger doesn't have a sophisticated graphical user interface, and we will interact with it using text commands. At this point, you can either continue with the program from where it is, or you can use the command

(gbd) monitor reset

to start it again from the very beginning. After doing so, you can use

(gdb) advance init

to run the startup code and skip to the start of the main program init.

When the program is stopped, you can use

(gdb) cont

to continue running it, and press Ctrl-C to stop it again. Alternatively you can use commands like

(gdb) step

and

(gdb) next

to run the program line by line: one steps into function calls, and the other steps over them.

Although GDB does not have a GUI, it can produce a display of what is happening in the program by using a multi-panel "Text User Interface". Before activating it, you may like to stretch the terminal window vertically so it occupies most of the height of your display. By using the command

(gdb) layout split

you can enter a mode where the display is split into several panes, with one showing the C source and another showing the disassembled object code. The command

(gdb) layout regs

then switches to show the registers and object code. It then becomes interesting to give the command

(gdb) stepi

in place of step and execute the program one instruction at a time, watching the register contents as you do so.

GDB provides many other commands that enable you to set breakpoints where execution will stop, to display the contents of variables, decode the subroutine stack, and even change values and alter the program's flow of control.

When you have finished, you can give the command

(gdb) quit

to leave GDB; this also shuts down the adapter program.

Starting the debugger manually

If an attempt to start the debugger doesn't work, using the debug script either from the command line or from Build>Debug menu item in Geany, then there are several things to try.

Before trying other things, make sure you have an up-to-date version of the lab software, including the debug script. To update, give the commands

$ hg pull

$ hg up

These will pull down any changes from the Mercurial server, then update your working copy. (If you are using Git instead of Mercurial, you have already accepted responsibility for helping yourself here.)

If you have made previous attempts to use the debugger, then defunct instances of the debugging adapter process pyocd may be hanging around and blocking access to the micro:bit board. Get rid of them by giving the shell command

$ killall pyocd

If there are no pyocd processes hanging around, then this command does no harm, so you may as well try it in every case. You might then like to try again with starting the debugger using the debug script. Previous versions of the debug script were likely to leave such processes around if the window was closed abruptly rather than exiting the debugger with the quit command; this should be fixed in the latest version.

If the debug script still doesn't work, then use killall pyocd again and follow the instructions below for starting both the debugger and the pyocd adapter separately by hand.

To start things manually, you will need three shell windows. In one, run minicom so you can see that the micro:bit is printing on its serial port. In another window we will run the adapter program pyocd that interfaces the debugger and the micro:bit over USB, and in the third window we will run the interactive debugger GDB itself.

In the second window, give the shell commands

$ killall pyocd

$ pyocd gdbserver -t nrf51

The first of these kills defunct instances of pyocd, and the second starts a fresh one, fulfilling the rôle of a server for debugger access, and expecting to find a board with an NRF51 series chip on it. The pyocd program should produce a slew of messages, ending with one to the effect that it is now listening on port 3333.

In the third window, change to the directory containing the program, then start GDB. On the lab machines, the relevant version of GDB os called gdb-arm; on other machines, try gdb-multiarch or arm-none-eabi-gdb instead.

$ cd digisys/lab0-echo

$ gdb-arm echo.elf

We tell GDB that we want to debug the program echo.elf, naming the binary file containing debugging information, rather than the downloadable file echo.hex. GDB will read the debugging information and show its prompt.

(gdb) target remote :3333

At this point, we tell GDB that it will be communicating with the debug adapter over a socket; the syntax :3333 refers to network port 3333 on this machine (because the portion in front of the colon is empty). When you give this command, you should see responses both from GDB and in the window where pyocd is running, then the GDB prompt should show where the program is stopped, and away we go. HAppy debugging!

↑ If you are lucky, this setup will agree with the default, and you can type just minicom.

Lab one – assembly language

Implement various arithmetic operations in assembly language

This lab is built around a program, mostly written in C, that calls an assembly language subroutine to perform an arithmetic operation. As supplied, there are two versions of the subroutine – one that uses the machine's adds instruction to add two numbers, and another that contains a simple but slow loop that performs multiplication. Your task is to add other variations, such as subtraction (easy), a faster multiplication algorithm (moderate), or a fast division algorithm (tough).

Ideally, this lab requires a micro:bit. If you can't get hold of one, but do have a Linux machine of some kind, then it's possible to experiment with assembly language programming using the emulation software QEMU. See a later section of these instructions for details.

The lab1-asm directory of the lab materials contains the following files, some of them the same as the corresponding files seen before:

`Makefile`	Build script
`fmain.c`	Main program
`func.s`	Subroutine – single add instrution
`mul1.s`	Subroutine – simple multiplication loop
`fac.s`	Factorials with `mult` as a subroutine
`bank.s`	"Bank accounts" with a static array
`hardware.h`	Header file with layout of I/O registers
`lib.c, lib.h`	Library with implementation of `printf`
`startup.c`	Startup code
`nRF51822.ld`	Linker script
`debug`	Shell script for starting debugger
`qmain.c`	Alternative main program for use with QEMU

In particular, main.c is the main program, written in C, with a loop that prompts for two unsigned integers x and y, then calls a function func(x, y) and prints the result. There's no reason why this function couldn't be written in C, but instead two versions written in assembly language are provided, as a starting point for experiments in programming at the machine level.

To build an initial program, just use the command

$ make

and usual (or use Build>Make within Geany. As a default, this includes the subroutine from func.s that uses one instruction to add its two arguments, and produces a binary file func.hex. You can load the program into the micro:bit by dragging and dropping, or by using the command

$ cp func.hex /media/mike/MICROBIT

Now start minicom, reset the micro:bit, and expect an interaction like this one:

Hello micro:world!
Gimme a number: 3
Gimme a number: 4
func(3, 4) = 7
func(0x3, 0x4) = 0x7

Note that positive and negative decimal numbers are allowed as input, and also numbers written in hexadecimal with a leading "0x". The result returned by func(x, y) is interpreted both as a signed number printed in decimal, and as an unsigned number printed in hexadecimal. You are welcome to modify the main program if you wish to change this behaviour.

A second implementation of the function func(x, y) is provided in the file mul1.s: it computes the product of x and y using a simple loop. You can build a program containing this definition of func with the command,

$ make mul1.hex

and load it into the micro:bit with

$ cp mul1.hex /media/mike/MICROBIT

or a similar command. (Within Geany, with the mul1.s source file open, choose Build>Make me to build the program, and Build>Upload me to upload it.) The subroutine works well for simple examples like 2 * 3 = 6, but you will find that one of 10000000 * 2 and 2 * 10000000 is much slower than the other. You may also notice that the main program lights one of the LEDs on the micro:bit before calling func, and switches it off again when func returns, so that you can see how much time the subroutine is taking. We can use the LED signal together with an oscilloscope to get a precise timing for the subroutine, and you can try this in the lab if you like.

Two additional implementations of a subroutine called func are provided, drawing from examples in the lectures.

In fac.s, the subroutine func(x, y) returns the factorial of x, ignoring y. The subroutine calls another subroutine mult to do the multiplications needed to calculate the factorial.
In bank.s, a static array of 10 integers is allocated, much as if it were declared in C with

static int account[10];

The subroutine func(x, y) increments account[x] by y and returns its new value; the values in the array are remembered from one invocation of func to the next, as is normal for a static array.

You can make your own implementation of the function func as follows:

Copy an existing implementation to get the structure straight:

$ cp func.s sub.s

Now edit the copy sub.s to replace the adds instruction with something appropriate to your wishes – perhaps a subs instruction.
Use the following command to assemble your code and link it into a binary file.

$ make sub.hex

Load the resulting file into the micro:bit.

$ cp sub.hex /media/mike/MICROBIT

With Geany, as before, you can choose Build>Make me and {{

In writing your own versions of func, you should note the calling conventions of the ARM chip: failing to obey them may make the program go haywire. In particular:

The arguments x and y arrive in registers r0 and r1.
When the subroutine returns, whatever is left in register r0 is taken as the result.
The subroutine may modify the contents of registers r0 and r1, and also the contents of r2 and r3.
Unless it takes special steps to preserve their values, the subroutine should not modify the contents of other registers such as r4 to r7. It may be that the main program is keeping nothing special in those registers, in which case trashing them will do no harm, but that's not something we should rely on.
The subroutine body should not mess with the stack pointer sp, or the link register lr. The link register contains the code address to which the subroutine will return via the final instruction bx lr, and overwriting this will result in chaos.

Tasks

Different participants in the course will have different amounts of experience with low-level programming, so you should choose whichever of the following tasks you find both possible and illuminating. Each requires you to produce an assembly language file containing a definition of the function func. For the first few, you can work most easily by tinkering with the supplied file func.s, but for later tasks you may like to make a new file to preserve your work.

Try replacing the adds instruction with a different instruction – such and the subs instruction that subtracts instead of adding. Explain to yourself why a subtraction such as 2 - 5 seems to give a large positive result when it is interpreted as an unsigned number.
Explore other ALU operations such as the bitwise logical instructions ands, orrs and eors, or the shifts and rotates lsls, lsrs, asrs and rors.
Use shifts and adds to write a function that multiplies one of its arguments by a small constant, such as 10.
Write a function that multiplies two numbers using the built-in multiply instruction muls, and compare it for speed with the supplied code. Note that the Nordic chip includes the optional single-cycle multiplier, but other instances of Cortex-M0 may have a slower multiplier or (I believe) none at all.
Write a faster software implementation of multiplication, using a log-time algorithm.
Write a simple implementation of unsigned integer division.
Write an implementation of unsigned division that runs in a reasonable time.

Some of these tasks we will look at in the lectures; others may be mentioned on a problem sheet. I don't think the overlap matters much, because different settings are good for focussing on different things. You can fruitfully discuss with your tutor the range of algorithms you might use for one of the tasks, but it's pointless to spend tutorial time on the details of assembly language syntax. Such details are not important for the exam, but it helps understanding if you gain some sense of mastery of the technicalities.

Factorials are often used as an example of recursion, though a bad one because they can be computed with a simple loop. Rewrite the factorial program to use recursion and see how much worse it gets.
Factorials provide one way of computing binomial coefficients since choose(n, r) = n! / r! (n-r)!; others include filling in Pascal's triangle row by row, or using a recurrence such as

choose(n, r) = n/r * choose(n-1, r-1)

or

choose(n, r) = (n-r+1)/r * choose(n, r-1)

with suitable boundary conditions. Implement one or more of these methods, perhaps using the unsigned division routine you implemented earlier. Which method is the best, in terms of both speed and freedom from overflow?

Most of the new instructions (and derestricted versions of old ones) are encoded in 32 bits instead of 16 bits, with the first 16-bit half of the instruction coming from a region of the decoding chart that makes it illegal when read as a 16-bit Thumb instruction. As well as new instructions like the signed and unsigned divide instructions sdivs and udivs, you will find that arithmetic instructions can use all 16 registers, not just registers r0 to r7, they exist in variants without the s suffix that don't set the condition codes, and restrictions on the values of immediate constants are relaxed, so that you can write the instruction adds r0, r1, #100. Note that the "Thumb-2" code with mixed 16-bit and 32-bit instructions is different from the "Native" code where all instructions are 32 bits long, and the encoding is quite a bit simpler to decipher.

From one point of view, it's not worth becoming obsessive about trying to stick with the 16-bit instructions, because the 32-bit instructions are almost as fast, and a 32-bit instruction is certainly no slower than the pair of 16-bit instructions that might replace it. If you (or a compiler) write code that favours the low registers for simple tasks, then many of the instructions you want will happen to have a 16-bit encoding, and the assembler will automatically produce good binary code for your program. From another point of view, when learning about the machine we can notice that the 16-bit subset gives a good guide to the operations that are important: for example, the existence of 16-bit instructions for loads and stores with addressing relative to the stack pointer gives a clue that these operations are important for access to local variables in a subroutine.

As far as the exam is concerned, don't waste time trying to memorise the details of what instructions will fit in 16 bits and what fit in 32 bits, or trying to learn the full repertoire of 32-bit instructions. The exam paper will have the usual summary of Thumb code, and anything additional you need to know will be stated in the question. You will not be penalised for writing sensible instructions just because they happen not to have an encoding on one processor or another.}}

An alternative way of testing

The bigger ARM processors (like the one used in the Raspberry Pi) can execute code in Thumb mode – and there the fuss about the bottom bit of a code address being 1 for Thumb mode has some point to it. Nobody who programs the Pi seems to bother with Thumb mode, however, because with 1GB of memory or more, why worry about code size?

Nevertheless, this compatibility with bigger processors gives a convenient way to test fragments of Thumb assembly language, and the Makefile for this lab exercise is set up to support it. The trick is to compile a main program into native ARM code, assemble the test function into Thumb code, then link them together so that the Thumb code can be called from the ARM code. The resulting binary will then run on the emulator qemu-arm. Try this:

$ make q-func.elf
$ qemu-arm q-func.elf 23 34
func(23, 34) = 57
func(0x17, 0x22) = 0x39

The main program test.c is compiled into native ARM code, and expects to run under Linux. It looks for two numbers as command-line arguments, then calls the subroutine func that is separately assembled into Thumb code. It's important to include the .thumb_func directive in front of every function, or the assembler will fail to assemble it into Thumb code, or it will be invoked without switching the machine to Thumb state, and things will not go well. The main advantage of this way of testing is that the command to invoke qemu-arm can be part of a shell script, and then we can set up automated testing. That's something we'd also want to set up for code running on the micro:bit in a larger project.

In order to use this method on your own machine, you will need to install packages to support compiling code for bigger ARM processors that runs under Linux, and also to install the QEMU emulator to run the result (if you don't have a Raspberry Pi). On Debian and derivatives, something like

$ sudo apt-get install qemu-user gcc-arm-linux-gnueabihf

ought to be sufficient.

Connecting an oscilloscope

The main program from fmain.c measures the time taken to call the func() subroutine using a hardware clock integrated into the micro:bit's processor chip, but you may like to use an oscilloscope or logic analyser pod to measure the same time independently. The main program lights an LED before calling the subroutine and switches the LED off when it returns. We can get external access to an I/O line that is used to control the LED, and by that means time how long it is lit – even if that time is too short for the flash to be visible. A suitable signal is accessible on pin P3 of the board.

To connect the scope or logic analyser, it's convenient to plug the microbit into an edge connector breakout board that has header pins for each contact on the edge connector.

Connect the crocodile clip of the scope probe to ground, marked "0V" on the breakout board. Several pins at that end of the board are grounded, so there's little risk of shorts. Connect the probe tip to pin 3.
To capture the single event of the LED lighting, we need to inhibit the scope from always updating the trace, which it does by default so as not to appear frozen. Use the trigger menu to change the trigger mode from Auto to Normal. This should be shown in the top line of the display. (You can write to the manufacturer to ask why Normal is not the default, but don't expect a reply.)
Now set the horizontal scale to 1 microsec/division, and the vertical scale to 1 V/div on channel 1. Set the trigger level to about 1.5 V (it's not critical).
At this point, if you run your program, the scope should succeed in capturing the timing pulse.
Enable a measurement of the pulse length by activating the Measure menu, choosing Time, scrolling down to +Width by turning the GP knob, and pressing the knob to confirm.
As you adjust the trace, you'll see a blank where the previous acquisition contains no data. Run the program again to fill in the gaps.

Alternatively, you can connect one of the cheap logic analyser pods and use the PulseView software on the host PC to capture traces.

Connect the ground wire (white) of the logic analyser to "0V" on the micro:bit, and channel 0 of the logic analyser (black wire) to pin 3.
In PulseView, enable only channel 0, and set it as to trigger on a rising edge. Set a pre-trigger interval of (say) 5% so as to capture some data before the trigger event.
Set PulseView to acquire samples at a rate of 16MHz. Since this is the same as the clock frequency of the micro:bit, we cannot hope to measure the length of each pulse with perfect accuracy, and there is always the danger that the signal will be captured as high for one cycle too few or too many. PulseView starts to show individual samples once you zoom in far enough to see them.
To capture data for a short run, it's necessary to find by experiment a procedure for resetting the micro:bit and starting the logic analyser, so that it is not triggered by noise surrounding the reset. Try pressing reset and holding it, then starting the capture, then releasing the reset button.

The poor resolution of the logic analyser is slightly disappointing, but inevitable given that the pod contains a microcontoller (what else?) built with the same technology as the micro:bit. The oscilloscope, by way of contrast, has special acquisition circuitry using multiple A-to-D converters each with a precisely defined capture interval. This is expensive but capable of acquiring up to 2 gigasamples per second. We can make better use of the logic analyser by getting it to time longer events, such as a program that does multiple calculations or a loop that runs tens or hundreds of times. A single cycle difference in each iteration will then add up to a measurable change in the runtime.

Whether you use the scope or the logic analyser, one measurement of your program gives little information, but you can repeat the same calculation multiple times to see whether the pulse length is consistent, and then try changing parameters to see how they affect the result. On the V1 board, the observed change will be a whole number of clock cycles, each lasting 62.5 nanosec.

Alternatively, you could rewrite the driver fmain.c so that it prompts for inputs once, then repeats the calculation forever, producing a repetitive sequence of timing pulses that can be shown on the scope. Reset the program to use different inputs.

Lab two – general purpose I/O

Enhance an electronic Valentine's card to respond to button presses

This lab begins with a program (written entirely in C) that displays a beating heart pattern on the micro:bit's LEDs: it might be an electronic Valentine's card. Your task is to enhance the program so that it shows different patterns when the buttons are pressed.

The lab2-heart directory of the lab materials contains the following files, some of them the same as the corresponding files seen before:

`Makefile`	Build script
`heart.c`	Main program
`hardware.h`	Header file with layout of I/O registers
`startup.c`	Startup code
`nRF51822.ld`	Linker script
`heart-intr.c`	Interrupt-driven static heart program
`blinky.s`	Pure assembly language program for blinking LED

The file heart.c contains all the code specific to this program. Using the addresses of hardware registers that are given in the header file hardware.h, it configures for output those GPIO pins that are connected to the LED matrix, and for input the two pins that are connected to the buttons. Then it enters a nested loop, where the outer loop (in the main program init) shows two images in alternation, a big heart and a little one, with the little heart shown twice for brief periods in each cycle, giving the impression of a beating heart.

Schematic for LED array and buttons

There's another loop (in function show) that looks after the display of each image. On the micro:bit, it's possible to light a single LED by enabling its row and column, or all the LEDs by simultaneously enabling all the rows and all the columns. But to show a specific image, it's necessary to show it one row at a time, multiplexing between the rows fast enough for the flashing to be lost in persistence of vision. To show each row, we activate that row and the columns for the LEDs in that row that should be lit, and then pause for a while before moving on the next row. Things are complicated for the programmer by the fact that, although the LEDs are arranged physically in a 5 x 5 array, they are wired up in a slightly chaotic pattern to make 3 'logical' rows with 9 LEDs in each (and two LEDs missing from one of the rows). Each image is represented in the program by an array of three integers, giving the value that must be set in the I/O register to display each of the rows.

After setting the GPIO lines to display one of the rows of an image, the program enters an innermost loop (in function delay) that simply does nothing for a while, until it's time to move on to the next pattern. The delay loop has been written with a carefully chosen number of nop instructions (which do nothing but take one cycle) in its body, so that each iteration of the loop takes 8 cycles, or 500ns on a 16MHz machine. The delay in microseconds is doubled before entering the loop.

A delay loop like this works fine in a simple program, but it commits the processor to be doing nothing useful while the delay is counting down. In more complex programs, there will be other work to do, and it will be unacceptable to waste time in a delay loop when the processor could be doing something useful (or even mining Bitcoin!). We will study later the means (interrupts) to allow this, but you are welcome to enhance this program also to use a timer interrupt instead of a delay loop.

Tasks

There are various ways you can experiment with this program. For one thing, it's instructive to make the inner loop delay for longer, so that the multiplexing between rows is no longer hidden by persistence of vision.

The main task is to make the program interactive, so that the pattern on the display changes when either button is pressed – from a big heart that flashes to a small heart to a hollow heart that flashes to a filled heart. You will need to make the program sense whether a button is pressed, and determine the bit patterns needed to display the empty heart. Think carefully about the effect you want: should the new patterns appear immediately, or at the beginning of the next heart-beat?

Another possibility is to make patterns on the display fade in and out, by still devoting 5msec to each row in each iteration, but actually illuminating the LEDs (or some of the LEDs) for only part of that time. Each LED is either fully on or fully off, but if it is on for only a fraction of the time, it will appear dimmer.

Details

In order to design the hollow heart pattern, you'll need to know what each GPIO output bit means. On the V1 micro:bit, there are twelve bits that matter, three to select a row, and nine to select which LEDs in that row are illuminated. The bottom 16 bits of the output register are laid out like this:

r3 r2 r1 c9  c8 c7 c6 c5  c4 c3 c2 c1  0  0  0  0

The bottom four bits aren't used, but the other twelve bits correspond to the rows and columns. The logical arrangement of LEDs is shown in the diagram above. To show the filled-in heart pattern, we want to light 2.4, 2.5, 3.4, 3.5, 3.6, 3.7, 3.8, 2.2, 1.9, 2.3, 3.9, 2.1, 1.7, 1.6, 1.5, 3.1. To light an LED, we must put a 1 in the right row, and a 0 in the right column, because the cathodes of the LEDs are connected to the column bits. So we get the pattern

0  0  1  0   1  0  0  0   1  1  1  1   0  0  0  0  =  0x28f0
0  1  0  1   1  1  1  0   0  0  0  0   0  0  0  0  =  0x5e00
1  0  0  0   0  0  0  0   0  1  1  0   0  0  0  0  =  0x8060

and these are the constants embedded in the program.

The good news is that there is actually little need to work out these constants by hand, because the header file hardware.h contains a sneaky macro IMAGE that allows us to write the definition of heart as

const unsigned heart[] =
    IMAGE(0,1,0,1,0,
          1,1,1,1,1,
          1,1,1,1,1,
          0,1,1,1,0,
          0,0,1,0,0);

The resulting list of expressions is exteremely complicated, but the C compiler is able to reduce each expression to the right single 32-bit constant.

The program already contains code to initialise the pins connected to the two buttons as inputs: they are pins 17 and 26, which hardware.h identifies with the symbolic constants BUTTON_A and BUTTON_B. To test whether each button is pressed, you need to look at the correct bits in the value read from GPIO_IN, which can be selected using the masks BIT(BUTTON_A) = 0x20000 and BIT(BUTTON_B) = 0x4000000. As the circuit diagram shows, the buttons are connected between the pin and ground with a pullup resistor. That means the input bit will be 1 when the button is not pressed, and 0 when it is pressed. (The macro BIT is also defined in hardware.h so that BIT(x) = (1 << x).)

Bonus programs

heart-intr.c

The program in heart-intr.c is interrupt-driven, and displays a static heart pattern without using delay loops. Use

make heart-intr.hex

to generate a downloadable file. One of the problems on Sheet 3 asks about enhancing a program like this to show a beating heart.

blinky.s

Almost all of the programs in the course rely on the code in startup.c to initialise the micro:bit when it comes out of reset. The assembly language file blinky.s avoids this, and contains all parts of a complete program that blinks one of the LEDs. Use

make blinky.hex

to generate a downloadable file.

The program establishes values for just the first two elements of the vector table, giving the initial values of the stack pointer and the program counter; since it enables no interrupts, the remaining vectors need not appear. The program contains a subroutine with a delay loop, and a main program that initialises the relevant GPIO pins as outputs, then uses the delay subroutine to flash the central LED.

Lab three – interrupts

Investigate a program that uses interrupts to overlap computing a list of primes with printing it

This lab begins with a pair of programs (compiled from the same source file) that output a list of primes on the serial port. One of the programs uses polling to wait for the serial port to be ready before transmitting each character; the other buffers the characters waiting to be output, and uses interrupts to send each character when the port is ready.

The lab3-primes directory of the lab materials contains the following files:

`Makefile`	Build script
`primes-poll.c`	Main program that uses polling
`primes-intr.c`	Main program that uses interrupts
`hardware.h`	Header file with layout of I/O registers
`lib.c, lib.h`	Library with implementation of `printf`
`startup.c`	Startup code
`nRF51822.ld`	Linker script
`lab3.geany`	Geany project file

Typing make as usual (or selecting Build>Make in Geany) will build two version of the primes program: in primes-poll.hex is the version that uses polling, and in primes-intr.hex is the interrupt-driven version. The implementation of the function serial_putc that does the work of printf is different in the two programs, and the interrupt-driven program has an additional function with the special name uart_handler that the hardware calls when a UART interrupt is triggered.

One of the LEDs on the micro:bit is turned on while the program is running and printing the first 500 primes, and turned off at the end. You can time the program with a watch, or wire the board up to an oscilloscope or logic analyser to get a more accurate timing.

Tasks

Modify primes-poll.c so that transmission of each character completes before serial_putc returns, rather than before transmitting the character on the next call. Does this have a measurable effect on the running time?
How small can you make the transmit buffer and still have the interrupt-driven version primes-intr.c work? Does a very small buffer adversely affect the running time?
Add code to monitor the maximum number of characters stored in the transmit buffer, and print it at the end. Try increasing the buffer size to larger powers of two – you should be able to use values up to 8192 – and see if the whole buffer is ever filled.
Add code to turn on an LED when the program is searching for the next prime, and turn it off when it is printing a prime it has found. Use a scope or logic analyser on the LED and the serial line to visualise the overlap between thinking and printing.
What happens to the running time of both programs if we modify them to print not the first 500 primes, but the first 500 primes that are more than 1 000 000 or 10 000 000?
If the calls to intr_disable and intr_enable in serial_putc are removed, does the program continue to work? Can you persuade it to go wrong? What if the critical section is reduced to cover only the else part of the conditional if (txidle)?
When an interrupt occurs, the register values are saved on the stack. This ought not to affect the functioning of properly written code that has been correctly translated: but can you write some sneaky code to detect that memory just beyond the top of the stack is changing in an unpredictable way? Hint: the loop in serial_putc ought to experience some interrupts.

More demanding:

Find out from Chapter 21 of the hardware reference manual for the nRF51822 how to configure the random number generator. Write a driver based on the hints given in Problem Sheet 3, and write a program that generates and prints a histogram showing the distribution of random values. Note: symbolic constants for the device addresses of the RNG are in the latest revision of the source file hardware.h.
A bizarre challenge: find out if serial transmission can be implemented by bit-banging. This will mean configuring the correct pin as a GPIO output, and using delay loops to generate the RS-232 waveform with the correct timing. Use of an oscilloscope or logic analyser will be essential to get the timing right.

Lab four – micro:bian

Experiment with an embedded operating systems that supports concurrent processes communicating by messages

This lab introduces micro:bian, a very simple embedded operating system kernel. The directory lab4-microbian contains the following files:

`Makefile`	Build script
`hardware.h`	Header file with layout of I/O registers
`lib.c, lib.h`	Library with implementation of `printf`
`startup.c`	Startup code
`nRF51822.ld`	Linker script
`microbian.c, microbian.h`	Operating system
`mpx-m0.s`	Context switch code for Cortex-M0
Device drivers:
`serial.c`	Serial port
`timer.c`	Timer
`i2c.c`	I²C bus (incl. accelerometer)
`radio.c`	2.4GHz radio
Example programs:
`ex-heart.c`	Heart and primes
`ex-echo.c`	Echo lines from keyboard
`ex-race.c`	Relative process speeds
`ex-today.c`	Mutual exclusion
`ex-level.c`	Accelerometer-based spirit level
`ex-remote.c`	Remote button presses

There are several example programs for you to experiment with:

heart is the ultimate version of the electronic Valentine's card, containing independent, concurrent processes for displaying the beating heart and printing the romantic list of prime numbers on the serial port.
echo is a simple test program for the UART driver. You can type lines of text on the keyboard, with echoing and line editing, and they are printed back when you press Return.
race is a demonstration of scheduling uncertainty: one process increments a counter while another periodically prints its value. The precise sequence of values printed depends on when the processes are scheduled.
today is an exercise in mutual exclusion: two politicians repeatedly spout their slogans, but they cannot be understood unless an interviewer intervenes to make them take turns.
level is an electronic spirit level. It uses the I²C bus to talk to the accelerometer chip on the micro:bit, and it displays a single moving pixel that responds when the board is tilted.
remote is a radio-based remote control. If two or more micro:bits in the same room are running the program, then pressing button A or B on one of them will cause all the others to display A or B respectively.

To support these programs, the operating system kernel (in microbian.c) is augmented with drivers for the UART (in serial.c), a system timer (timer.c), the I²C bus that links the processor to the on-board accelerometer and magnetometer (i2c.c), and the 2.4GHz packet radio intergrated into the microcontroller (radio.c). For simplicity, the header file microbian.h declares in one place the routines provided by all these modules.

Typing make or choosing Build>Make as usual compiles all the example programs into hex files ready for download to the micro:bit. With one of the example programs open, you can choose Build>Upload me to upload the corresponding hex file to the micro:bit.

Tasks

Heart: Try removing the system call that gives the display process a higher priority than the primes process. Then start searching for primes at 1000000 or 10000000 instead of 2. Observe the results, then reinstate the priority. Why should the display process have a higher priority than the primes process?
Race: Run the program and observe the output. Then try swapping the two calls the start() from init(). Why does this affect the action of the program? Although unpredictable in advance, the results printed are actually consistent from run to run: why is that? Try adding a driver for the RNG (see below), and see if just doing that introduces enough randomness to make the results change from run to run.
Today: Run the program and observe the output. Then introduce an interviewer process that makes the two politicians speak in turn. One solution has each politician passing its slogans to the interviewer; another has the interviewer giving permission to a politician to speak until they indicate they have finished.
Make a driver process for the hardware random number generator. Write a program that shows random dice rolls on the display whenever a button is pressed.
Alternatively, there is an onboard sensor that measures the temperature of the processor die, giving an answer in quarters of a degree Celsius. It generates an interrupt when data is ready, but then suspends itself until started again. Write a device driver for it.
Construct a test program to measure the time taken to send and receive a message as the length of an output pulse. Experiment to find the combination of circumstances that makes this quickest: does sending the message with sendrec help, and why?
Design an interesting multi-person application that uses the radio to communicate. As configured, the radio module can broadcast packets containing up to 32 bytes of payload. For point-to-point communication, you could embed a destination address in each packet, and have each micro:bit ignore messages that were not addressed to it. One idea is to implement the chain reaction game.

Documentation

For micro:bian itself: there is a page describing the kinds of processes and messages supported by micro:bian, and another page with unix-style manual pages for each system call.
For the device drivers: another page describes the facitilies provided by each device driver.

Typically, a device driver supports two interfaces: one based on messages sent to the driver task, and responses that are sent back; and another that consists of a collection of functions that client processes can call, with each such function constructing one or messages on the stack of the calling process, then sending them to the device driver task. Both are described on the linked page.

[1] If you are lucky, this setup will agree with the default, and you can type just minicom.

[1]

Lab manual for Digital Systems

Contents

Lab zero – getting started

Getting the sources

Compiling the program

Running the program

Using Geany

Using a debugger

Starting the debugger manually

Lab one – assembly language

Tasks

An alternative way of testing

Connecting an oscilloscope

Lab two – general purpose I/O

Tasks

Details

Bonus programs

heart-intr.c

blinky.s

Lab three – interrupts

Tasks

Lab four – micro:bian

Tasks

Documentation

Navigation menu

Lab manual for Digital Systems

Lab zero – getting started

Getting the sources

Compiling the program

Running the program

Using Geany

Using a debugger

Starting the debugger manually

Lab one – assembly language

Tasks

An alternative way of testing

Connecting an oscilloscope

Lab two – general purpose I/O

Tasks

Details

Bonus programs

heart-intr.c

blinky.s

Lab three – interrupts

Tasks

Lab four – micro:bian

Tasks

Documentation

Navigation menu

Search