Frequently asked questions (Digital Systems)

From Spivey's Corner
Jump to: navigation, search

New questions[edit]

If you have a question, perhaps it is answered here – or maybe you can find help in the growing glossary. Feel free to add headwords to the glossary so Mike can fill in the definitions, or to add questions below.

Can you write the question within triple single quotes like this?

Then I will write the answer afterwards. Don't worry about formatting words like micro:bit properly – I will copy-edit your question when I add the answer.

You may prefer Mercurial, but I'm dedicated to Git. How can I stay true to myself?

In place of

$ hg clone

at the start of Lab 0, you can use the command

$ git clone

instead, and hack away to your hearts content without compromising your principles. (This feature is brought to you by the superior flexibility of Mercurial.)

Shouldn't the 'move' instruction really be called 'copy'?

Probably so: mov r0, r1 puts in r0 the same number that is held in r1, without changing the contents of r1.

Arising from lectures[edit]

These items are links to questions answered at the end of some lectures.

Lecture 1: How does the processor know how many instruction bits to use as the opcode?

Lecture 1: How does the control unit know to interpret the last nine bits of an adds instruction 0x1840 as the names of three registers?

Lecture 2: What is the file /usr/lib/gcc/arm-none-eabi/5.4.1/armv6-m/libgcc.a that is mentioned when a program is linked?

Lecture 2: If you're prepared to use that integer division subroutine from the gcc library, why not use the library version of printf too, instead of writing your own?

Lecture 3: Will we need to memorise Thumb instructions for the exam?

Lecture 3: If a subroutine expects its return address in the lr register, and one subroutine A calls another subroutine B, won't the call to B overwrite the return address of A, so that A no longer knows where to return?

Lecture 4: Is overflow possible in unsigned subtraction, and how can we test for it?

Lecture 5: What's the difference between a subroutine, a procedure and a function?

Lecture 8: What does the assignment statement GPIO_OUT = 0x4000 look like in assembly language?

Lecture 18: In the lectures, you used only p-type transistors in the upper part of a CMOS gate and the corresponding n-type transistors in the lower part. Can we mix them up in order to make a circuit that implements (a ∨ ¬b) ?

Lecture 19: What is the semantics of your C-like hardware description language?

Lecture 19: Aren't sequential circuits equivalent to finite-state automata?

Lecture 21: What's the difference between decoded, derived and dynamic signals?

Lecture 21: In the lectures, you talk about processor designs containing lots of multiplexers, but some books talk about designs containing various busses. What's the difference?

Lab exercises[edit]

Something in the lab exercises isn't working. What should I do?

First of all, please come to the sessions! The lab exercises for this course are (deliberately and inevitably) a bit less cut-and-dried than those for other courses, and there is more scope to explore directions of your own. Today in the lab we had a team of two plundering my code for the Seven Stars of Death to make the LEDs flash in their own way. We were able to help them with that, and it was far more worthwhile than following my pre-conceived list of tasks.

Second, check for updates on the labs page. Some differences are inevitable between the environment in the Software Lab and what works on my machines, and we can't hope to pin down every last one in advance. And as you'll discover, it sometimes takes a lot of thought to work out a reason for the strange behaviours we often see. There's no operating system to intervene and stop your program with a helpful message; and even the seven stars of death only tell you that something is wrong.

When an update is published, there's a procedure for merging it into your working copy using Mercurial. We will help you with that in difficult cases, such as where you've already made one fix in the lab but you are merging in others.

I want to see the Seven Stars of Death. How can I make them appear?

Ok, I suppose we'd better get this over with. Any unexpected interrupt or exception will lead to the seven stars, and you can provoke a HardFault exception with a bx instruction where the target address is even, not having the Thumb bit set. So the two instructions

movs r0, #0
bx r0

ought to do it. You may prefer to find more creative ways of your own. In fact, you probably will.

The detailed mechanism is as follows: the file startup.c contains code that lays out the interrupt vectors for the program, including the vector associated with the HardFault exception. Unless these vectors are redefined elsewhere, they all point to a routine spin that turns off interrupts, (re-)initialises the GPIO pins, and then uses them to display the seven stars, flashing them forever using delay loops. The code is made simpler by the happy fact that the seven LEDs needed are all in the same row of the 3 x 9 wiring matrix. The code is deliberately mashed together in one function (rather than, say, using a subroutine delay), so that it doesn't depend on anything else. Many undefined behaviours of the chip evoke a HardFault exception, and attempting to clear the T bit in the psr using a bx instruction is one of them. The machine saves some registers on the stack, fetches the HardFault vector from address 12, and starts executing the routine at that address, which immediately calls spin.

I stop the echo program with the debugger, then try typing in the minicom window. Nothing happens; and even if I type cont to gdb, still nothing happens. Why?

The debugger can stop your program, but it can't stop other things from happening. In particular, if the UART (the interface connected to the serial port) receives a second character before your program has collected the first one, it signals an 'overrun error' and will do nothing more until the error is cleared. The echo program is written in the simplest way possible, assuming characters will arrive slowly enough for it to fetch each one before the next arrives, and it doesn't have provision for detecting or clearing such errors. When the program is stopped by the debugger, it hasn't a hope of keeping up with your typing. So once you've typed a second character in minicom while the program is stopped, all is lost until you reset the micro:bit. This is a perfect example to show that debuggers are of limited use in embedded systems development.

Does the lab software include drivers for the magnetometer and accelerometer on the micro:bit?

They are on the to-do list. It would be nice to add driver processes to Phōs, both of them clients of a process that drives the I2C bus. If you would like to implement this, (a) come and do a project with me, and (b) note the following. The I2C implementation on the micro:bit seems quite sensitive to noise, so it will be necessary (unlike, e.g., the very simple UART driver we're using) to detect error situations and try to correct them. In the case of the I2C bus, if a slave device is holding the SDA line low, this may entail bit-banging the clock line until the slave thinks the transaction is complete. Apparently, switching on and off LEDs on the micro:bit creates enough noise on the I2C lines that there's a big risk of hangups; one solution might be to have the I2C driver to ask the display driver process to hold off for the duration of an I2C transaction.

What is the file phos.a that is made in Lab 4?

The file phos.a is a library archive, containing copies of various files of object code such as phos.o and serial.o. The linker treats such libraries specially, linking in only those parts of them that are needed by the program it is linking. Thus, for example, Phōs programs that don't need the timer will be linked without including the driver process for it. Making and using the library simplifies the Makefile while keeping the resulting programs small.

How can I restore the original demo program to my micro:bit?

Download and flash the hex file you can find on the official activities page as 'BBC micro:bit Original Code' under 'You'll need'.

Assembly language[edit]

Why write integer constants in hexadecimal for low-level programming?

We often want to work out the binary bits that make up a constant, and hexadecimal notation makes it easy to do so. Each hexadecimal digit corresponds to four bits of the binary representation, and it's not too hard to memorise that bit pattern corresponding to each digit:

0  0000          8  1000
1  0001          9  1001
2  0010          a  1010
3  0011          b  1011
4  0100          c  1100
5  0101          d  1101
6  0110          e  1110
7  0111          f  1111

To convert a larger hexadecimal constant to binary, just concatenate the four-bit groups corresponding to each digit: for example,

0xcafe1812 = 1100 1010 1111 1110 0001 1000 0001 0010.

How can we convert between decimal and hexadecimal?

For quantities that fit in 16 bits or less, it's helpful to recall that the powers of 16 are 1, 16, 256, 4096. Then to convert, say, 1326 to hexadecimal, we note that the highest multiple of 256 that fits is 5 * 256 = 1280. We can set out the calculation as follows.

5 * 256  =  1280
2 * 16   =    32

The remainder, 46, can be expressed as 2 * 16 + 14, and we get 1326 = 5 * 256 + 2 * 16 + 14 * 1 = 0x52e. The final 'e' is a hexadecimal digit with a value of 14.

To convert the other way, we just calculate 0x52e = 5 * 256 + 2 * 16 + 14 = 1326.

What registers can a subroutine use?

It can freely use registers r0 to r3, destroying any values they had when the subroutine was called. The (first four) arguments of the subroutine arrive in those registers, but need not be kept there, and the result can be moved into r0 just before the subroutine returns. Note, however, that if this subroutine calls any others, they in their turn may destroy the values in all four of these registers.

In addition, a subroutine can use registers r4 to r7, provided it leaves them with the same values on exit that they had on entry. An easy way to ensure that is to push them on the stack with a push instruction at the routine start (together with the register lr), and pop them off again with a pop instruction at the end (together with the pc). Any subroutines that are called will also obey this convention, so important values can be left in these registers across calls.

What if a subroutine has more than 4 arguments?

In that case, arguments beyond the first four are passed on the stack. It's the caller's responsibility to put them there, following a layout that is fixed by the machine's calling convention, and then the subroutine itself can access them using load instructions. I don't think we'll need to write any subroutines with so many arguments (not in assembly language, anyway).

What do I do about labels if I have a big assembly language program with lots of functions and several loops? Can I re-use labels like loop and done in each function?

No, I'm afraid you can't – you can't have two labels with the same name in a single file of code. You can either prefix each label with the name of the enclosing function, as in foo_loop or baz_done, or you can make use of a system of numeric labels that do allow multiple definitions. There can be several labels written 2: in the file, and you refer to them using the notations 2b or 2f, meaning the next occurrence of label 2 going backwards (2b) or forwards (2f) in the file. That saves the effort of inventing names.

Compilers solve the problem in a different way, just inventing their own labels with names like .L392 and never reusing them. That's OK when nobody is expected to read the resulting assembly language.

When I try running my assembly language program with qemu-arm, I get this message: what does it mean?

qemu-arm: ... tb_lock: Assertion `!have_tb_lock' failed.

That's a symptom of the program running haywire, possibly because it has branched to a random location and started executing code there. A probable cause of that is returning from a subroutine with the stack in a bad state, so that the value loaded into the pc is not the correct return address.

What is this .thumb_func directive all about?

Most of the time, the assembler and linker manage to work out for themselves that assembly language code is written using the Thumb encoding, and all works sweetly. But just occasionally (e.g., when writing an interrupt handler in assembly language), it's necessary to mark a function explicitly as being written in Thumb code: in the case of an interrupt handler, so that the interrupt vector can be marked with a 1 bit in the LSB. If this is done properly, then the processor stays in Thumb mode when the interrupt vector is followed. Failing to do this results in an attempt to switch to native ARM mode, and an immediate HardFault exception: the program crashes irretrievably. This seems a bit silly in a device that doesn't support native ARM mode at all, but it is done in the name of software compatibility with bigger devices.

Given that the .thumb_func directive is occasionally needed (at times when it's difficult to determine whether it is actually necessary), and given that it is hard to debug the HardFault that occurs when a needed directive is omitted, it seems simplest just to make a habit of writing .thumb_func in front of every assembly-language function.

I wrote nop in my program, expecting to get the instruction encoded as 0xbf80, but the assembler produced 0x46c0 instead. Why?

Later versions of the ARM architecture implement a specific nop instruction with encoding 0xbf80. On earlier versions, the instruction encoded 0x46c0 is used instead. The rainbow chart decodes that as mov r8, r8, a form of move instruction that can access high registers and (crucially) does not write the condition codes. In practice, there's no significant difference between the two instructions: both last one cycle and have no effect.

What is the difference between the data segment and the bss segment?

The C compiler takes static variables (those that are declared outside any function) and assigns them to one of three segments that are differently treated by the linker. Plain variables such as the array a declared by

int a[10];

are assigned to the bss segment. When the program starts, such variables have no special initial values, or rather, their initial values are zero, false, null. Other variables may be given specific initial values: for example, the array b declared by

int b[10] = { 3, 1, 4, 1, 5, 9, 2, 6, 5, 4 };

These variables are allocated in the data segment, and unlike the bss segment, the program output by the linker contains an image of their initial values. The third segment used for static data is the rodata segment, used for 'variables' that are declared as constant, such as the array c declared by

const int c[10] = { 2, 7, 1, 8, 2, 8, 1, 8, 2, 8 };

These variables do not need space in the RAM, and can be kept in the (more spacious) ROM instead. Such constant tables are common in embedded programs: for example, a lunar lander game might contain a table of the sine of each whole number of degrees from 0 to 360, to save time computing them when the angle of a thruster changes.

Ok, fair enough: but how can I declare similar variables in assembly language?

You can use the directives .data or .bss or .segment rodata to select a segment, a .align directive to pad for alignment, then either .word or .space directives to allocate space. Like this:

    .align 2
    .space 40

    .align 2
    .word 3, 1, 4, 1, 5, 9, 2, 6, 5, 4

    .segment rodata
    .align 2
    .word 2, 7, 1, 8, 2, 8, 1, 8, 2, 8

The directives .data and .bss are just abbreviations for .segment data and .segment bss. If you want a scalar variable instead of an array, just reduce the amount of space from 40 to 4, or give only one word of initialisation data:

    .align 2
    .space 4

    .align 2
    .word 314159

Ok, now I see how to declare global variables. How can I use them?

Let's try writing in assembly language the function that we could express in C as follows.

unsigned g = 4;

unsigned foo(unsigned x, unsigned y) {
    unsigned z = g;

    g = x;

    return y + z;

As you can see, this uses a global variable to remember the argument x from one invocation of foo to the next, with the remembered value being 4 initially. In the following interaction, the values printed are 20+4 = 24 and 8+10 = 18, and the memory contains 6 for the next invocation.

Gimme a number: 10
Gimme a number: 20
foo(10, 20) = 24
Gimme a number: 6
Gimme a number: 8
foo(6, 8) = 18

Without further ado, let's try to express the same thing in assembly language. First, we allocate space in the data segment for the global variable.

    .align 2
    .word 4

In the function foo, we need to get access to g, and to do this, we plant the address of g somewhere in the text segment, and use pc-relative addressing to load the address into a register.

    ldr r3, [pc, #8]
    ldr r2, [r3]
    str r0, [r3]
    add r0, r1, r2
    bx lr
    .align 2
    .word g

The instruction ldr r2, [pc, #8] loads the address g of the global variable into r2, by loading it from a convenient place aaa in the text segment. Then, with the address g in a register, we can use a second ldr instruction to load its value, which sits in the data segment in RAM. The point is that there are two load instructions, one to get the address of the global variable, and the second to get its value. The address is a constant, and can be stored in the ROM at a place where we can address it using pc-relative addressing. The value is variable, and is stored in RAM, at an address that we cannot write as part of the code stream. The address in r3 is also used to write the global variable with the first argument of the function from r0 – then we add together the second argument and the fetched value of g to give the result, putting it in r0 before returning.

We could write the instruction ldr r2, [pc, #8] as ldr r2, aaa to get the assembler to compute the offset 8 for us. Or, better still, write ldr r2, =g to get the assembler to find a 'convenient place' for us, and leave it to put the constant at the end of its output to the text segment, or put the hint .pool after the bx lr (and delete the rest of the source shown aboive from that point on. However we express the source file, the resulting object program is the same: it has a word in RAM, another word in ROM that contains the address of the first word, and a pc-relative load instruction that addresses the second word.

If we want to plant constants using .word directives, or use the .pool directive to give the assembler a hint where to put its own table of literals, where is a safe place to put them?

If we use .word to put constants (such as addresses) into the text segment to access them with pc-relative addressing, then we need to be sure that the machine will not attempt to execute these constants as if they were code. For example, we could put them immediately after the bx lr or pop {pc} that ends a function, because we can be sure the processor will never try to fetch and execute instructions from there. We can put .pool directives in similar places to give the assembler a hint it can place its tables of literals there. If we don't put in these hints, then the assembler will build one big table of literals and put it after the last function in the source file. Usually this is sufficient, because our fragments of assembly language will be small.

At some stage, you promised a micro:bit program written entirely in assembly language: where is it?

The source for (a coming version of) Lab2 contains a program blinky.s that blinks the central LED. Say make blinky.hex to obtain a downloadable hex file.

C programming[edit]

What's the difference between the expressions m.m_type and m->m_type?

The first of these is used when m is a structure (record) declared with message m, so that storage for an actual structure is allocated by the declaration; it names the m_type field of the structure.

The second is used when m is a pointer to a message declared elsewhere. We can declare such a pointer with message *m, and if msg is a message declared as in the first part of this answer, we can write &msg to get a pointer to it. The notation m->m_type selects the m_type field of the structure to which m points; it is equivalent to (*m).m_type, because *m is the structure to which m points, and we can use the notation from the first part to select its m_type field. The notations are equivalent in meaning and in efficiency, but m->m_type looks a bit neater.

Your C programs use types like int and unsigned and short. Wouldn't it be more portable to use types like int32_t and int32_t and int16_t that have a well-defined width on all platforms?

Yes, it would: it's just that I'm trying to keep down the number of obscure conventions that are used in the programs. On the ARM, it's safe to assume that int and unsigned are 32 bits.

Which is faster as a way of writing an infinite loop: while (1) ... or for (;;) ...?

This is one of many things connected with minor efficiency that really don't matter. If I were writing a compiler, I would ensure that both of these idioms resulted in exactly the same code, with the loop body followed by an unconditional branch from the bottom back to the top. If by accident gcc generates slightly less tidy code for one form, then we are unlikely to notice the difference. (Similar remarks apply to the idioms n++ and ++n when written as statements.)

According to the standard books about C, the main program should be called main, and yet in this course it is called init. Why is that?

In normal C, the main program ought to have the heading

int main(int argc, char **argv)

reflecting the fact the main receives two parameters, one being the number of command-line arguments (including the program name), and the other being an array of strings, each corresponding to one of the arguments. The return type is int, because the program may terminate by returning an integer, zero for success and some non-zero value for failure.

Our function init takes no parameters and delivers no result:

void init(void)

What's more, once we have an operating system on the chip, then unlike main, when init returns the program is not over, but concurrent execution of the processes in the program is only just beginning.

These differences might be enough to make a change of name desirable, but there are practical reasons too. One is that GCC (un)helpfully signals a warning if a function named main does not have the heading it expects; another is that it treats main specially in the output, assigning it to its own section in the object file so that a linker script can place it in the program image at a specific place. Both of these behaviours are unhelpful to us.

  • Actually, the warning is suppressed by specifying -ffreestanding, which also suppresses other warnings about the types of functions like sprintf. And the separate section is only enabled with the optimisation flag -Os or -O2, not at the -O level we will generally use, mostly to make it easier to interpret program behaviour under a debugger.

You have not explained the implementation of printf. How does it work?

The implementation of printf in any of our programs is split into two parts: the part that is independent of the output device, and renders the data as a sequence of characters, and the device-dependent part that takes each character and transmits it on the output device. The C code is made slightly complicated by two facts: printf takes a variable number of arguments, and the device-independent part is a higher-order function that takes the device-dependent part as a parameter.

Let's begin with the code that puts the two parts together:

void serial_printf(char *fmt, ...) {
    va_list va;
    va_start(va, fmt);
    do_print(serial_putc, fmt, va);

This function takes a variable number of arguments, so we can write a call such as

serial_printf("prime(%d) = %d\n", n, p);

where the one fixed argument – the format string – indicates that there are two additional arguments to come, both integers. The code for serial_printf gets a kind of pointer va to the variable part of the argument list, and passes this pointer, together with the device-dependent output function serial_putc and the format string fmt, to the library routine do_print. Variable argument lists are implemented in different ways on different machines. On the Cortex-M0, the compiler will store the arguments from r0--r3 into the stack frame together with the fifth and later arguments (if any), then use a pointer to the relevant portion of the stack frame for va. What happens on a particular machine is all hidden in the functions va_start and va_end.

I'll simplify the code for do_print a bit: the real version also supports the function sprintf that formats data like printf but saves the resulting string in a character array instead of printing it. Here's the function heading:

void do_print(void (*putch)(char), const char *fmt, va_list va) {

The C syntax is a bit obscure here, but it indicates that do_print takes three arguments:

  • A (pointer to) a function putch that accepts a character as argument and returns nothing.
  • A string fmt, represented as a character pointer.
  • The variable argument list va from do_print's caller.

The body of do_print uses these to do its job.

     char nbuf[NMAX];

     for (const char *p = fmt; *p != 0; p++) {
          if (*p == '%' && *(p+1) != '\0') {
               switch (*++p) {
               case 'c':
                    putch(va_arg(va, int));
               case 'd':
                    do_string(putch, itoa(va_arg(va, int), nbuf));
               case 's':
                    do_string(putch, va_arg(va, char *));
          } else {

As you can see, do_print iterates over the format string, using a pointer p to point to the character of that string that it is currently working on. I've shown the implementation of three format codes here: %c for printing a character, %d for printing a decimal integer, and %s for printing a string. Note that you can write %% to print a literal percent sign. The formatting loop works its way through the variable argument list by using the function va_arg (more likely, a macro), and outputs each character by calling the argument function putch. A helper function do_string is used to call putch on each character of a string, such as the one returned by the function itoa (the name is traditional) that converts an integer into a string.

We carefully avoid global variables so that printf can be called from multiple processes simultaneously – it is reentrant, to use the jargon. That's the reason for the array nbuf that is passed to itoa. None of this will be on the exam.

Code in the examples that prints messages on the terminal contains "\r\n" to move to a new line, but other examples of C code that I've seen use just "\n". Why is that?

On an old-fashioned terminal, and on the simulation provided by minicom, the carriage-return character 0x0d (written '\r' in C) is needed to move the cursor or print head back to the left of the page, and the line-feed character 0x0a (written '\n') scrolls up by a line. Normally, the C library or operating system is written so as to insert a '\r' in front of each '\n' output by a C program. But we are working without a library and without an operating system, so we must do this ourselves. (The serial driver in Phōs does insert the '\r' in its version of the serial_putc function.)

If I ask the C compiler to output assembly language (and turn off -g), then most of the file is intelligible, but these lines appear close to the start of the file. What do they mean?

.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1

The lines record the assumptions used by the compiler in generating code: things like whether there is a hardware floating point unit, whether unaligned memory access is allowed, etc. These assumptions will be recorded in a special section of the binary file output by the assembler, and the linker then checks that the assumptions are consistent between different files of code.

C provides bit-fields for access to parts of a memory word. Why don't we use them instead of using shifting and masking operations to program peripheral registers?

On the ARM, most peripheral registers must be accessed as a whole 32-bit word, and can't be used with the ldrb, strb, ldrh, strh instructions that access bytes or 16-bit halfwords. Also, it's sometimes the case that peripheral registers must be accessed exactly once in an operation, or some side-effect may be duplicated. The C compiler doesn't guarantee to respect these conventions when it generates code for bit-field access, so we avoid bit-fields and write (with the help of simplifying macros) explicit shifting and masking code instead. The resulting object code is no worse than would be compiled for bit-fields anyway.

What indentation style do you use for C?

My style is close to the one used in Kernighan and Ritchie, with the notable exception that I put the opening brace of a function on the same line as the function prototype. I don't insist on braces around a single statement controlled by if or for or while, except that the else part of an if must, for balance, have braces if the if part does, and } else { goes all on one line. Expressions should have parentheses whenever the interpretation is anything but obvious without them.

I don't accept the argument that without braces, accidental mistakes of indentation can mislead about the scope of an if or while – but I use an editor that reliably gives the correct indentation automatically.

Hardware details[edit]

What configuration is needed to use the hardware random number generator?

See Chapter 21 of the hardware reference manual for the nRF51822. The header file hardware.h (in the latest revision of Lab 3) defines symbolic constants such as RNG_VALUE for the device register addresses. If you make a first attempt, we will help you to get it working in the lab sessions.

Can an interrupt handler itself be interrupted?

The interrupt controller (NVIC) provides a priority scheme, so that the handler for a low-priority interrupt can be interrupted if a request for a higher-priority interrupt comes in: these 'nested' interrupts are the N in NVIC. But we shall not use this scheme, instead giving all interrupts the same priority. In Phōs, interrupt handlers will be very short, and just send a message to an attached driver process, so there is no need for nested interrupts. But whether nesting is allowed or not, it's clear that an interrupt handler cannot be interrupted by a second request the same interrupt.

What happens when the micro:bit boots?

The Nordic chip doesn't boot, in the sense that there is a small program that is responsible for loading a larger one. Instead, we could talk about it "coming out of reset". When the hardware Reset signal is deactivated (for example, by releasing the reset button), the processor fetches the word at address 0 and loads it into the stack pointer, and fetches the word at address 4 (the reset vector) and loads it into the program counter. That is the extent of the hardware's responsibility.

On the software side, the C compiler, the linker, the linker script NRF51822.ld and the C functions in startup.c work together to determine what happens next. The initial stack pointer is set to the top of RAM, and the reset vector points to the routine __reset in startup.c. The contents of the RAM at this point are meaningless, and may be random. Here's the code for __reset:

/* __reset -- the system starts here */
void __reset(void) {
     unsigned *p, *q;

     /* Make sure all RAM banks are powered on */
     POWER_RAMON |= BIT(0) | BIT(1);

     /* Copy data segment and zero out bss */
     p = __data_start; q = __etext;
     while (p < __data_end) *p++ = *q++;
     p = __bss_start;
     while (p < __bss_end) *p++ = 0;

     /* Initialise the scheduler */

     /* Let the program initialise itself */

     /* Start the scheduler */

     /* If there is no scheduler, spin */
     while (1) pause();

After ensuring (in case someone has switched them off) that all parts of the RAM are powered, this initialised the portion of the RAM that is devoted to static variables. The linker has made all the code for the program refer to locations in the bottom of the RAM, but the data which ought to appear there is not yet present. Instead, the linker has (folowing directions in the file NRF51822.ld) put an image of the data segment in the ROM just after the program code. It has also defined symbols like __data_start, __bss_start and __bss_end that we can use here. The code in __reset copies the image of the data segment word by word into the bottom of RAM, then zeroes out the bss segment. From this point on, the program can assume that static variables have their proper initial values.

There follow three calls to other subroutines. In the programs we write in the first part of the course, two of these subroutines, phos_init and phos_start do nothing, and init is the main program, responsible for all its activity. As you will see, if init ever returns, the program falls into an infinite loop, forever doing nothing; interrupt handlers remain active during this loop. Later in the course, we will start to include the Phōs operating system in our programs, and then phos_init will set up its data structures, the main program init will just create a number of operating system processes, and phos_start is the scheduler that interleaves execution of the processes, which work together to carry out the program's tasks. The scheduler never returns.

What is the difference between the I/O registers UART_INTEN, UART_INTENSET and UART_INTENCLR?

The UART_INTEN register shows which events in the UART can cause an interrupt: you can write a value to the register to enable or disable individual interrupt sources, and you can read it to find out which sources are actually enabled. The Nordic chips have a scheme – mostly for convenience – that allows individual bits in this register to be set or cleared without having to read the register, modify it, and write back the result. If you write a value to the UART_INTENSET register containing one or more 1 bits, then the corresponding bits in UART_INTEN will be set to 1, and the other bits will remain unchanged. Similarly, writing a pattern of 1 bits to UART_INTENCLR clears the corresponding bits of UART_INTEN and leaves the others unchanged. The general scheme is described in Section 10.1.2 of the nRF51822 Reference Manual.

About Phōs[edit]

Why can't one Phōs processes start another?

In short, because that would be a complication that would add little utility. The memory allocation in Phōs is deliberately simple, stack space for each process being nibbled away from one end of a single chunk of free space. That works well if a fixed number of processes are started when the program begins, but much less well if processes come and (by implication) go during the life of the system. If processes exit and restart, we will need to re-use the storage space of dead processes to find room for new ones; and unless we are prepared to move processes around, we then face the problem of fragmentation of storage – all too messy to contemplate. Given that the operating system is only going to support a fixed collection of processes, we may as well have those processes created as part of initialisation, before the scheduler takes control.

When a process makes a system call such as receive(ANY, &m), how does the operating system access the arguments?

The arguments are in registers r0 and r1 before the svc instruction that enters the operating system. The handler for the SVC interrupt gets the the stack pointer for the process after the context has been saved onto the stack. It can then reach into it (in a machine-dependent way) to discover what the values of r0 and r1 were at the time of the call. It also uses the saved pc value to look at the svc instruction, which contains an integer constant that identifies which system call – receive or send or another – was made. Given the address of the message buffer m, the operating system can read or write the memory belonging to the process in order to copy a message from sender to receiver.

What happens if a process loops forever, consuming CPU time, without ever calling send or receive?

Scheduling is non-preemptive, so once such a process is running, it will stay running until the next interrupt, and other processes will not run until then. Phōs is not really designed to share the processor fairly among processes that always want to run, because doing that really needs a clock interrupt to be directly connected to the scheduler in order to measure out time slices, and we'd like to make use of timers optional. The assumption is that CPU time is a plentiful resource, so that fairness in sharing it doesn't matter. In the heart example, it works well to give the process driving the display a higher priority than the one searching for primes, so that after a timer interrupt it get to run first.

If a Phōs process calls a subroutine (like primes calling serial_printf which subsequently calls serial_putc) and the subroutine calls send to send a message, how does the operating system know which process sent the message?

Phōs always knows which processs is currently running. The stack frames for primes and serial_printf and serial_putc are all part of the stack of that process. So when the call to send is made, the operating system knows that it is made on behalf of the PRIMES process. The same subroutine serial_putc could be called at another time by another process, and then is would generate a message from that process – whichever one was current at the time.

You can use receive(ANY, &msg) to receive a message from anywhere. Why not send(ANY, &msg) to send a message to any receiving process?

Practically speaking, because that's useless: we often want to write a server process that can get a request from any client, carry it out, then send a reply. This is supported by receive(ANY, &msg) followed by a reply sent to msg.m_sender. An attempt to send to any process at all is not likely to succeed, because the receiving process is quite likely not to be interested in the message. (Theoretically speaking, the dual of receive-from-any ought to be send-to-all, and that is no more likely to be a useful operation.)

Why is the stack space of a Phōs process filled with the value 3735928559 when the process starts?

This helps with finding out how much stack space the process is actually using: when Phōs prints a dump of the process table, it shows how much of the stack space of the process has been changed from this initial value, and we can assume that this is the total stack space that has been used, unless by some slim chance the last few words of the stack have been set to this magic value by the process itself. As to the choice of value, it appears in the Phōs source code in hexadecimal notation, which makes it seem more significant.

In Phōs, what happens if an interrupt is triggered when the attached process is not waiting to receive a message?

The process table entry for each process has space to record one pending interrupt message, so if an interrupt arrives and the process is not waiting for it, the message can be delivered later. The default interrupt handler, which converts interrupts to messages, also disables the interrupt that invoked it, because the interrupts on the micro:bit continue to be triggered if the cause is not removed, and removing the cause (such as an idle UART) requires hardware-specific actions from the driver process. So a second interrupt cannot happen until the driver process has run to re-enable the interrupt. If this behaviour is inadequate to the situation, it's possible to replace the default handler with another one for a specific device. It's a bit difficult to configure this without editing the Phōs source code, but the infrastructure is there to support it.

What happens if a device driver process does not finish responding to one interrupt message before another interrupt occurs?

As noted above, the default action of disabling the interrupt means that a second interrupt can't actually be requested. But if (for example) the interrupt comes from a timer, and interrupts are being lost causing the clock to run slowly, then the right thing is to move some work from the driver process to the interrupt handler, and modify it so that the interrupt remains enabled all the time.

In the UART driver process, code for input and code for output are intertwined. Wouldn't it be neater to untangle them and have separate driver processes for the two sides of the UART?

In some ways, yes. But the vital deciding factor in favour of a single driver is that the two sides of the UART share a single interrupt request, so interrupts for both sending and receiving turn into messages to a single process. We could use a more elaborate scheme where a little router process passed on interrupt messages to one or other of two driver processes, but we would be adding as much complexity as we could hope to remove. (A minor additional point is that echoing of input characters is easy to do if input and output share a driver process.)

If message-passing makes programming so much easier, why do widely-used embedded operating systems provide semaphores instead?

Programs written using semaphores have the potential to be faster than those written using message passing and server processes, because context switches from one process to another can be made much less frequent. For example, to buffer and print characters on the serial port, our message-based design requires about four context switches per character sent, in addition to the running of the UART interrupt handler. There is a context switch to send a request message to the UART driver process and put the character in the buffer, another to return to the main program, then two further context switches into and out of the driver process when a UART interrupt prompts it to send the character. By way of contrast, a semaphore-based solution might need no context switches at all, if characters are retrieved from the buffer and sent to the UART directly in the interrupt handler. This is because a down operation on an available semaphore only needs to disable interrupts for a moment to test and lock the semaphore, and need not block the invoking process. It's only when a process needs to block on a semaphore that a context switch is needed.

On the other hand, message-based concurrency is much easier to think about, and it makes it easier to avoid problems like priority inversion, where a high-priority process is prevented from making progress because it needs to lock a semaphore currently held by a low-priority process.

Why does Phōs not have (i) pre-emptive scheduling, (ii) nested interrupt handlers, (iii) an interruptible kernel?

In a simple, interrupt-driven system that is not overloaded, most of the time every process is blocked waiting for something to happen, and the idle process is 'active', in the weak sense that it is the running process, but it has put the processor to sleep with a wait-for-interrupt instruction (wfe on the ARM). When an interrupt arrives, processes wake up, there is a flurry of activity, and then the system settles down to sleep again, and the time needed for that to happen is bounded. If there is no response signal due from the system as a result of an interrupt that must be produced in less than this time-bound, then there is no need for pre-emptive scheduling (which does not change the total amount of work to be done, but just affects the order in which to tasks are carried out). And if we keep interrupt times short by doing no more than sending a message in each interrupt handler, then there will be little need to allow the running of one interrupt handler to be suspended in order to tun another. Likewise, if the time spent in delivering a message is short, it won't matter much if an interrupt is delayed until the kernel has finished.

There are well-developed theories of process scheduling that aim to ensure that complex sets of real-time deadlines are met using limited resources, and in circumstances where they are needed, these theories are really valuable. In simple cases, however, process scheduling is hard to get wrong, in the sense that if every process waiting for an event gets to run after the event has happened and before the system goes to sleep again, then everything will happen when it should and in good time. Phōs is designed with these modest goals in mind.

Why this Phōs operating system, and not another, industry standard real-time operating system such as FreeRTOS?

FreeRTOS is OK, and provides a rag-bag of facilities for multi-tasking, including processes, semaphores of different kinds, message queues, timers, and a lot more. It is quite complicated, and everything seems to have lots of options that interact in subtle ways. For the course, I wanted to find out how far we could get with a much simpler scheme, where there are no nested interrupts, possibly no preemptive scheduling, and the system is structured into a family of processes communicating by messages. As you will discover in Concurrent Programming next year, programming with semaphores is a complicated business; and next year will be soon enough for that. And I can't resist saying that the coding style of FreeRTOS gives me the creeps.

Where does the name Phōs come from?

Phōs (ΦΩΣ) is the Greek word for 'light', and also the first word in Greek of the ancient hymn known as Phos Hilaron (Hail, gladdening Light), particularly associated with John Keble, formerly Fellow of Oriel, in memory of whom Keble College was named by its founder, Edward Pusey (also at one time a Fellow of Oriel). There is also an echo of the motto of the University, Dominus illuminatio mea, which consists of the first words of Psalm 27: The Lord is my light, or κύριος φωτισμός μου in the Greek translation of the Hebrew Bible known as the Septuagint, where it is numbered as Psalm 26. The macron above the ō in transliteration is intended to show that the letter in Greek is omega rather than omicron, so that the vowel should be pronounced like those in 'photo', not as 'foss'. No allusion is intended to Phở, the Vietnamese noodle soup, though one can see a Phōs application as a collection of noodly threads swimming in a nutritous broth.

Digital electronics[edit]

What are all the numbers shown on an oscilloscope trace?

Here is a trace of a character being sent on a serial line.

RS232 waveform
RS232 waveform
  • In the top left, you can see (STOP) that acquisition of the signal has been stopped, presumably so that I can save a screenshot; at other times, the same label shows WAIT (when the scope is waiting for a trigger event) or TRIG'D (when an acquisition is in progress).
  • Next to it (100us/) we see that the horizontal scale is 100 microseconds per division. The clock period for the waveform can be seen from the trace to be a bit more than 100us, because the transitions fall a bit behind the grid of dotted divisions towards the right of the picture.
  • At the top right, we see that the oscilloscope's trigger is detecting a falling edge on channel 1 and set at a level of 1.72V, midway between the high of 3.3V and the low of 0V.
  • In the left margin, you can see the trigger level marked (T), as well as the 0V level for channel 1.
  • At the bottom left, you can see that channel 1 is DC-coupled, and has a scale of 1.00V per division; the difference between high and low is therefore seen on the trace to be about 3.3V.
  • On the black background just below the top margin, you can see at the left a T mark denoting the trigger point. I've adjusted this to appear towards the left of the display, so that we can see subsequent events, but as the outline at centre top shows, there's plenty more waveform captured in this single-shot mode before and after the trigger that we could see by scrolling.

Edit: here is a similar trace made with one of the new oscilloscopes in the lab.

RS232 waveform
RS232 waveform
  • In the top left, the legend shows that Channel 1 has a scale of 1V/div, while Channel 2 is off.
  • In the top right, the center of the display is delayed by 30.73ms from the trigger event (which is off the left of the screen). The horizontal scale is 200μs/div, and acquisition is stopped. The scope is triggered by a falling edge at a level of 1.6V.
  • In the bottom center, Channel 1 (yellow) has an offset of +1.98V from the bottom of the screen. The channel is DC couples, and the probe attenuates by a ratio of 10:1 (see below).
  • In the right-hand column, from top to bottom:
    • The scope is acquiring samples without any special processing (Normal) at a rate of 50 million samples per second. (Consequently, the narrowest, 100μs, pulses of the displayed waveform correspond to 5000 samples, giving plenty of data for zooming in.)
    • Repeating the info from below, both channels are DC coupled with 10:1 probes.
    • Waveform math is disabled (greyed out).
    • Reference signals are disabled (greyed out).

It's normal to use oscilloscopes with probes that contain an integral voltage divider, so that the voltage seen by the scope is 1/10 of the voltage at the probe. This raises the effective input impedance of the scope, most vitally reducing by an order of magnitude the extent to which the circuit being measured is loaded by the capacitance and inductance of the probe cable. After digitising the input signal, the scope multiplies the readings by a factor of 10 to compensate, so that true voltages are shown on the display.

Why do wired-or busses like I2C have the convention that signal lines are active-low?

Electrically, active-low busses are better in two ways: one, N-type MOSFETs are generally better behaved than P-types, and specifically tend to have a lower on resistance. This makes them better able to pull the signal line low crisply (when it may have significant capacitance) than a P-type could pull it high. In older systems an NPN transistor in an open-collector output stage had a similar advantage over a PNP transistor, and for the same physical reason – that the majority carriers in these transistors are electrons, which have a better motility than holes. The second reason for using active-low signalling is that a bus may cross between circuits that are using different power supplies, and ground is then used as a common reference, whereas the power rails may differ a little or a lot in voltage. An active-low signal can have threshold voltages defined with respect to ground, be pulled low by an open-drain output with short fall time, and draw little or no quiescent current through its pullup resistor.

Software tools[edit]

I keep getting the following message when I try to start minicom. What gives?

minicom: cannot open /dev/ttyACM0: Device or resource busy

I found the second answer to this Stack Overflow question helpful:, but wish they would do the decent thing and reinstate Monica.

Why are the C compiler and other tools called by names like arm-none-eabi-gcc and arm-none-eabi-objdump?

These long names are chosen to distinguish these cross-compiling tools from native tools like gcc, which on a Linux PC is a compiler that translates C into code for an Intel processor. The lengthy names make explicit that the compiler generates ARM code (arm) for a machine running no operating system (none), and uses a standard set of subroutine calling conventions called the Extended Application Binary Interface (eabi). This explicitness is needed, because a Linux machine might also have installed compilers for the Raspberry Pi (gcc-arm-linux-gnueabihf) and maybe AVR microcontrollers (gcc-avr) and the MIPS (mipsel-linux-gnu-gcc). As you can see, the naming conventions aren't quite consistent between GCC cross-compiler projects.


Why use the ARM? Why not the x86 or x86_64 we already have in our laptops?

I wanted us to program a machine where all the code was accessible to our inspection. That's easy to achieve on a microcontroller, and as simple as it can be on an ARM-based machine. Yes, we could program x86 machines in assembly language, though we would have to face some unavoidable complexity that has accrued during the 30 or more year history of the architecture. But we wouldn't easily get to program the bare machine and understand everything that happens.

Why C? Why not C++? Or why not Python/Java/Scala/Haskell/Some other higher level language?

Quite a lot of embedded programming is done in C++ these days, and in fact the official software support for the micro:bit is in C++. This gives a number of advantages: peripheral devices can be initialised by declaring a static instance of the driver class: PinOut led(PIN3); and C++'s overloading feature subsequently lets you write led = 1; and led = 0; to turn on and off the LED, even if the actual code needed is something more complicated, possibly involving a function call. This is all very well for applications programming, but it really gets in the way of understanding what is happening at the machine level.

Also, for C++, as for other high level languages, we would need to know how the high-level constructs are implemented in order to see how a high level program corresponds to a sequence of machine instructions. For C, this is more straightforward, and we can use C as a quick, clean and portable way of writing what we could also write in assembly language. And in any case, nobody's employment prospects were ever hurt by saying that they knew how to program in C.

Why bother with assembly language if we can write all our programs in C?

Several reasons: first, our aim here is to give a complete account, spanning all levels of abstraction, of the functioning of a small computer system. We're going to take for granted a bit the process of translating high level programs into machine code (more on that next year), but we do need to understand machine language as the interface between hardware and software. Second, when we come to implement an operating system, we will need to understand clearly what it means to save the state of one process and restore the state of another, and that can only be seen at the level of machine code. And third, even if writing assembly language is something you never need to do (especially for the ARM with Thumb code), it's a really useful skill to be able to read the assembly language output by a compiler to see what is happening. When you've used instrumentation to identify the speed-determining part of your big program and done your best to write it in high-level code that will run fast, it's good to be able to look at the compiler output and see if it is sensible.

Is it "twos' complement" or "two's complement" or "twos complement"?

It's "two's complement". To see why this is right, consider that the ones' complement of a number is that quantity that, when added to the number, results in a string of ones:


The two's complement is obtained by adding 1 to the ones complement: when this is added to the original number, the result is a single power of two:


So the two's complement of a number is that quantity that, added to the number, results in a power of two. You could make case for the reading "twos' complement" – the complement belonging to a string of 2s. (Cf. Knuth, Vol. 2, p. 203).

You keep saying the micro:bit has 16kB (kilobytes) of RAM. That means 16,000 bytes: isn't it true that there is actually 16KiB (kibibytes) of RAM, or 16,384 bytes?

Yes, that's right. You can collect your refund on the way out. Please note as you go that it's correct to put a thin space between a number and a following unit, as in 16 KiB, and that a thin space should also be used in place of a comma to separate thousands, as in 16 000 bytes, to avoid confusion with the use of a comma as a decimal marker in some countries. Glad we sorted that out.

What fonts do you use on your website?

On webfont-enabled browsers, running text is in Roboto Sans, and titles in Roboto Slab. Code samples are set in Roboto Mono, and the occasional scary word is set in Unifraktur Cook, just for fun.

Now that Scratch has subroutines (including recursion: is it finally a valid teaching tool?

Scratch does factorials

I'm not mad keen. Particularly worrying in this example is the use of global variables for everything, and the apparent lack of built-in provision for subroutines that return a result. I'm not sure of the purpose of the statement set <n> to <<n> - <1>>, but perhaps that's not really needed; certainly the statement <set <result> to <<1>*<result>> could be removed.

By the way, I think recursion is overrated for general programming, and there are many purposes for which we could use a cleaned-up Fortran – a language supporting (unnested) subroutines with parameters and local variables but no recursion. The book Software Tools by Brian Kernighan and P. J. Plauger makes that argument for this quite eloquently.

We'll write a few recursive subroutines early in the course as a demonstration that we can, and that they're supported by the stack mechanism we discuss and by the addressing possibilities of our machine. But in later parts of the course, for example in writing the Phōs operating system and programs based on it, there will be no need for recursive subrotuines. And with only 16KB of memory to play with, we'd better be sure any recursion doesn't go too deep.

An alternative instruction encoding for the ARM in which each instruction is encoded in 16 rather than 32 bits. The advantage is compact code, the disadvantage that only a selection of instructions can be encoded, and only the first 8 registers are easily accessible. In Cortex-M microcontrollers, the Thumb encoding is the only one provided.

(General-Purpose Input/Output). A peripheral interface that provides direct access to pins of the microcontroller chip. Pins may be configured as inputs or outputs, and interrupts may be associated with state changes on certain input pins. On the micro:bit, the LEDs and pushbuttons are connected to GPIO pins.

(Universal Asynchronous Receiver/Transmitter). A peripheral interface that is able to send and receive characters serially, commonly used in the past for communication between a computer and an attached terminal. It is commonly used in duplex mode, with the transmitter of one device connected to the receiver of the other with one wire, and the receiver of the one connected to transmitter of the other with a different wire. The asynchronous part of the name refers to the fact that the transmitter and receiver on each wire do not share a common clock, but rely instead on the signalling protocol and precise timing to achieve synchronisation.

The simulation of a complex protocol by setting the values on GPIO pins under direct program control, with delay loops for timing. For example, serial communications require timings that are more precise that can be achieved by bit-banging, unless the processor is doing nothing else.

(A near-synonym for ABI). The convention that determines where arguments for a subroutine are to be found, and where the result is returned.

A symbolic representation of the machine code for a program.

Four bits, N, Z, V and C, in the processor status word that indicate the result of a comparison or other arithmetic operation. Briefly, N indicates whether the result of the operation was negative, Z indicates whether it was zero, C is the value of the carry-out bit from the ALU, and V indicates whether the operation overflowed, yielding a result that was different in sign from what could be predicted from the inputs to the operation. A comparison is treated like a subtraction as far as setting the condition codes is concerned. After the condition codes have been set, a subsequent conditional branch instruction can test them, and make a branch decision based on a boolean combination of their values. All ten arithmetic comparisons (equal, not-equal, and less-than, less-than-or-equal, greater-than, and greater-than-or-equal for both signed and unsigned representations) can be represented in this way. When a process is interrupted, the condition codes must be saved and restored as part of the processor state, in case the interrupt came between a comparison and a subsequent conditional branch.

(Read-Only Memory). A form of storage whose contents are non-volatile (are not lost when the power is off) but cannot be changed under program control. Modern ROM is usually EEPROM – Electrically Erasable Programmable Read Only Memory, and can be changed electrically, and even under control of a program running on the microcontroller, but using special peripheral registers and not the normal store instructions. Flash memory is a modern, super-compact implementation of EEPROM, but for our purposes it does exactly the same job. We will modify the contents of the micro:bit's flash memory by downloading programs, but we will probably not be writing programs that change the contents of the flash memory.

A text, written in a specialised language, that describes the layout in memory of a program. Compilers typically divide their output into four named sections: text for the program code and embedded constants, data for statically allocated data that has a specified initial value other than zero, rodata for initialised data that is constant, and bss for data that is statically allocated but can initially be filled with zeroes. The linker script may add another section for the program's stack. For microcontrollers, a linker script is needed that puts the text and rodata in Flash, and lays out the RAM so that stack and bss are in separate areas, with the data copied into its own part of RAM from an image held in Flash.

(Nested Vector Interrupt Controller). An ARM processor component that is able to assign priorities to external interrupts (as opposed to those generated by internally by the processor) and control the servicing of interrupts. As the name indicates, it is able to cope with nested interrupts, where servicing of one interrupt is itself interrupted by another with higher priority. Note that, according to ARM conventions, higher priorities are indicated by lower numbers.

A register sp that holds the address of the most recent occupied word of the subroutine stack. On ARM, as on most recent processors, the subroutine stack grows downwards, so that the sp holds the lowest address of any occupied work on the stack.

A register that contains the address of the next instruction to be executed. Because of pipelining, on ARM Cortex-M machines, reading the program counter yields a value that is 4 bytes greater than the address of the current instruction.

A form of process scheduling where a process may be suspended when it has run for a certain time, or a process with higher priority becomes ready. Without pre-emptive scheduling, processes are only suspended when they wait for an event.

A gate output that consists of a single transistor connected between the output and ground. When the gate is active, it can sink current and pull the output low, but when it is inactive it does not prevent other outputs connected to the same wire from pulling it low. Open collector outputs are commonly used with a separate pull-up resistor connected between the wire and the positive power rail.

A gate output that consists of a single transistor connected between the output and ground. When the gate is active, it can sink current and pull the output low, but when it is inactive it does not prevent other outputs connected to the same wire from pulling it low. Open collector outputs are commonly used with a separate pull-up resistor connected between the wire and the positive power rail.

A single integrated circuit that contains a microprocessor together with some memory (usually both RAM for dynamic state and ROM for storing a persistent program) and peripheral interfaces.