Frequently asked questions (Digital Systems)

From Spivey's Corner
Jump to: navigation, search

New questions

If you have a question, perhaps it is answered here – or maybe you can find help in the growing glossary. Feel free to add headwords to the glossary so Mike can fill in the definitions, or to add questions below.

Can you write the question within triple single quotes like this?

Then I will write the answer afterwards. Don't worry about formatting words like micro:bit properly – I will copy-edit your question when I add the answer.

What's the difference between decoded, derived and dynamic signals?

Decoded signals are those that appear in the decoding tables and are determined by the instruction's opcode; derived signals also depend on other parts of the instruction halfword, and dynamic signals depend on other parts of the machine state.

  • The decoded signals will be the same in every instance of an instruction – that it to say, two instructions that share the same opcode will have the same decoded signals. Note, however, that two instructions like mov r and mov i8 may share the same mnemonic in assembly languageA symbolic representation of the machine code for a program., but have different opcodes, and so be treated as different instructions by the hardware, with different decoded signals. In the simulator, these decoded signals form the members of the |Control| structure |ctrl| contained in one of the decoding tables, and have names like ctrl->cShiftAmt.
  • The derived signals will be the same whenever a particular instruction is executed, because they are determined by the bits of the instruction taken all together; for example, the instruction add r3, r1, r2 always writes its result to register r3, and so has cRegC = 3. In the simulator, these derived signals are outside the |Control| structure, but also have names like |cRegC| that start with a lower-case |c| followed by an upper-case letter.
  • The dynamic signals differ from one execution of the instruction to another, so that at one time that add instruction could write |7| to r3, and another time it could write |8|, and the value of the |result| signal would be different in the two cases.

What is the semantics of your C-like hardware description language?

Semantics? You are taking it too seriously: it's just intended as a prompt to think about the behaviour of a sequential circuit as embodying a program.

But to answer the question, the most general form for a program that does make sense as a sequential circuit (with input a, state bits x and y, and output z) is this:

x = y = 0;
forever {
    x, y = f(x, y, a), g(x, y, a);
    z = h(x, y);
}

That contains a simultaneous assignment that sets new values for x and y based on their old values and the input, followed by an assignment that sets the output as a function of the new values of the state variables. This denotes a Moore machine in which the next-state function is <f, g> and the output function is h. The "forever" (no nesting allowed) means "at each clock edge".

Other "programs" such as this:

x = y = 0;
forever {
    if (a) x = !y; else y = x;
    z = x;
}

are to be re-construed in the simultaneous-assignment form shown above – in this case, by writing

x, y = (a ? !y : x), (a ? y : x);

so that if a is 1, then x gets set to !y and y is unchanged in each clock cycle, and if a is 0, the x is unchanged and y gets set to x. And we might design a shift register with y = x; x = a; meaning x, y = a, x; Anything that helps as a stepping stone towards the equivalent simultaneous assignment is encouraged.

Come to think of it, the general form shown above doesn't describe the value of the output during the period before the first clock pulse, which is of course h(0, 0). So perhaps it would be better to make the syntax be

x = y = 0;
forever {
    z = h(x, y);
    pause;
    x, y = f(x, y, a), g(x, y, a);
}

But if the existing form has led us to think of the behaviour as a program and reduce it to a simultaneous assignment, then it has served its purpose.

What do the notations Rd, Rn, etc., mean when referring to ARM instructions?

What they mean in the ARM documentation is not entirely clear, because the bit-fields that are labelled with these notations vary from one instruction to another. Perhaps they refer to the roles played by the registers in the instruction, with Rd being the destination, Rn the first operand, Rm the second, and so on.

I have used these names (perhaps misguidedly) to refer to fixed fields of the instruction half-word:

  • Rd is bits [2:0].
  • Rn is bits [5:3].
  • Rm is bits [8:6].
  • Rt is bits [10:8].

These names are used in the ThumbAn alternative instruction encoding for the ARM in which each instruction is encoded in 16 rather than 32 bits. The advantage is compact code, the disadvantage that only a selection of instructions can be encoded, and only the first 8 registers are easily accessible. In Cortex-M microcontrollers, the Thumb encoding is the only one provided. simulator for values of type RegSel, and there is a function regsel that interprets them, modelling the behaviour of each of the three muxes that determine which registers a particular instruction reads and writes. When I revise the course, I will change these names to prevent possible confusion with ARM documentation.

In the lectures, you talk about processor designs containing lots of multiplexers, but some books talk about designs containing various busses. What's the difference?

There isn't really a difference: a bus in a conventional design is a bundle of wires that can be driven by several sources at different times, usually by setting one source to produce an output and the others to enter a high-impedance state. If there is a difference, it is that we are neutral about the technology used to implement a multiplexer – logic gates, or the kind of three-state logicA convention where multiple outputs can share a single wire; those outputs that are not currently driving the wire high or low must be put into a third, high impedance state so as not to interfere with others. just described – whereas a bus, especially if it stretches between chips, has a specific implementation that must be followed.

In the lectures, you used only p-type transistors in the upper part of a CMOS gate and the corresponding n-type transistors in the lower part. Can we mix them up in order to make a circuit that implements (a ∨ ¬b) ?

No, the rule for simple gates is that there is a pull-down network consisting of only n-type transistors, and a complementary pull-up network consisting of only p-types, with the gate of each transistor connected to an input of the gate. Consequently, the gate's function is anti-monotonic, and a ∨ ¬b cannot be implemented by a single gate. The best we can do is NANDa, b), which requires a NAND gate and an inverter. In a complex circuit, it's highly likely that ¬b would already be available, so the function would actually require only a single gate to be added.

It's reasonable to ask what would be a 'non-simple' gate, and there the added elements are pass-transistors, pairs of transistors containing an n-type and a p-type that either pass or block a signal. They don't help with implementing a ∨ ¬b, but if there's time I will mention them. There's a nice non-restoring implementation of XOR that uses two pairs of pass transistors and has a lower transistor count than the other implementations we've seen.

Will we need to memorise Thumb instructions for the exam?

No, the exam paper will contain a quick reference guide (published here in advance) to the commonly used instructions – and of course, you won't be expected to remember details of the uncommon ones. As you'll see, the ranges covered by immediate fields are spelled out, but there will be no questions that turn on the precise values. A question could refer to an instruction not on the chart, but only if the question itself describes the instruction.

For example, the paper might ask, "Explain why it is advantageous to allow a larger range of offsets for the load instruction ldr r0, [sp, #off] than for the form ldr r0, [r1, #off]." Or it might ask, "There is an instruction svc that directly causes an interrupt: explain why this instruction provides a better way of entering the operating system than an ordinary procedure call." There certainly won't be a question that says without further context, "Decode the instruction 0x8447." And in asking for a simple piece of assembly language, the paper won't try to trick you with snags like the fact that add r0, r0, #10 has an encoding but add r0, r1, #10 does not. If you write the latter instruction in place of mov r0, r1; add r0, r0, #10 then you are unlikely to lose even a single mark.

Arising from lectures

What's the difference between a subroutine, a procedure and a function?

I tend to use all three terms interchangeably. I think subroutine is the most neutral term, unlikely to be confused with either an informal list of instructions (e.g., the procedure for downloading code to the micro:bit) or a mathematical operation (e.g., the sine function). The term procedure is common in thinking about Pascal-like languages, where indeed each subroutine is introduced by the keyword procedure. The corresponding term in C (and, I suppose, in functional languages) is function.

If a subroutine expects its return address in the lr register, and one subroutine A calls another subroutine B, won't the call to B overwrite the return address of A, so that A no longer knows where to return?

Our first attempts at assembly language programming have been tiny subroutines that call no others and use only the registers r0 to r3. For them (leaf routines), we can assume the return address arrives in lr and remains there until the subroutine returns with bx lr. Calling another subroutine does indeed trash the value of lr, and that means that non-leaf routines must carefully save the lr value when they are entered, so that they can use it as a return address later. As we'll see very shortly, this is neatly done by writing something like

push {r4, r6, lr}

at the top of the subroutine, an instruction which at one fell swoop saves some registers and the lr value on the stack. Then we write a matching instruction

pop {r4, r6, pc}

at the bottom to restore the registers and load the return address into the pc, effectively returning from the subroutine. It's good that the action of saving registers on the stack is separate from the action of calling a subroutine, because it allows us to use the simpler and faster scheme for leaf routines.

Is overflow possible in unsigned subtraction, and how can we test for it?

Overflow happens when the mathematically correct result bin(a) - bin(b) is not representable as bin(c) for any bit-vector c. If bin(a) bin(b) then the difference bin(a) - bin(b) is non-negative and no bigger than bin(a), so it is representable. It's only when bin(a) < bin(b), so the difference is negative, that the mathematically correct result is not representable. We can detect this case by looking at the carry bit: C = 0 if and only if the correct result is negative.

How does the control unit know to interpret the last nine bits of an adds instruction 0x1840 as the names of three registers?

The leading few bits of the instruction, up to 10 of them, determine which instruction encoding applies – adds ra, rb, rc in this case, determined by the leading seven bits. This determines the interpretation of the rest of the instruction. In this case the remaining nine bits are fed by the control unit to the datapath in three groups of three so as to determine which registers take part in the addition.

0001100 001 000 000     adds r0, r0, r1
adds    r1  r0  r0

There's another instruction encoding where a three-bit constant is fed to the datapath as the second operand instead of a register.

0001110 011 001 000     adds r0, r1, #3
adds    3   r1  r0

The control unit behaves differently in the two cases because the opcode is different. The colourful decoding chart enables us to deduce what instruction encoding applies to any string of 16 bits, but it doesn't specify exactly how the remaining bits are interpreted: in this case, it tells us that nine bits of the instruction are three register names or two registers and a three-bit constant, but not what order they appear. For that, you need to look at the appropriate page in the Architecture Reference Manual.

When programming, I mostly find the chart useful as a reminder of what encodings exist: for example, the instructions

adds r0, r1, #3

and

adds r0, r0, #27

are legal, but

adds r0, r1, #27

is not, because there is no way to encode it.

What is the file /usr/lib/gcc/arm-none-eabi/5.4.1/armv6-m/libgcc.a that is mentioned when a program is linked?

It's a library archive (extension .a) containing subroutines that may be called from code compiled with gcc. For example, the program in Lab 1 contains C a subroutine (xtoa in lib.c) for converting numbers from binary to decimal or hexadecimal for printing, and that subroutine contains the statement x = x / base. Since the chip has no divide instruction, gcc translates the statement into a call to a subroutine with the obscure name __aeabi_uidiv, and that subroutine is provided in the library archive. Exercise 1.6 asks you to write a similar subroutine for yourself.

If you're prepared to use that integer division subroutine from the gcc library, why not use the library version of printf too, instead of writing your own?

Using the library version of printf pulls in a lot of other stuff – for example, since the standard printf can format floating-point numbers, including it also drags in lots of subroutines that implement floating-point arithmetic in software. That's OK, but (a) I wanted to keep our programs small for simplicity, and (b) although we are not likely to fill up the 256kB of code space on the nRF51822, on other embedded platforms it's wise to keep a close eye on the amount of library code that is included in the program but not really used. Added to that, the version of printf in the standard library can call malloc, and we don't want that.

What does the assignment statement GPIO(General-Purpose Input/Output). A peripheral interface that provides direct access to pins of the microcontroller chip. Pins may be configured as inputs or outputs, and interrupts may be associated with state changes on certain input pins. On the <span style="font-family: 'Roboto Slab', Ubuntu, sans-serif">micro:bit</span>, the LEDs and pushbuttons are connected to GPIO pins._OUT = 0x4000 look like in assembly language?

GPIO_OUT is a macro defined in hardware.h as ADDR(0x50000504), itself an abbreviation for the expression,

(* (unsigned volatile *) 0x50000504)

This exploits the weak type system of C, and denotes an unsigned variable at the numeric address 0x50000504, a constant obtained from the nRF51822 datasheet. So the assignment amounts to

(* (unsigned volatile *) 0x50000504) = 0x4000

The volatile keyword amounts to saying to the C compiler, "Please just do the assignment now: don't try to be clever and optimise this in any way, such as combining this assignment with a later one that targets the same address."

To achieve this in assembly language, we need an str instruction for the assignment. But both the address being assigned to and the value being assigned are large constants, so we'll need to put the constants in registers first, using the pc-relative load instructions for which "ldr =" is a shorthand. So suitable code is

ldr r0, =0x4000
ldr r1, =0x50000504
str r0, [r1]

Note: that code works fine, and is reasonably efficient. But there's another way of putting the constant 0x4000 in register r0 that takes the same time on the Cortex-M0 and uses marginally less code space:

movs r0, #0x4
lsls r0, r0, #12

That code exploits that fact that the constant only has one non-zero bit, and that (or something very similar) is the code that gcc actually generates in place of the first ldr =.

Lab exercises

Something in the lab exercises isn't working. What should I do?

First of all, please come to the sessions! The lab exercises for this course are (deliberately and inevitably) a bit less cut-and-dried than those for other courses, and there is more scope to explore directions of your own. Today in the lab we had a team of two plundering my code for the Seven Stars of Death to make the LEDs flash in their own way. We were able to help them with that, and it was far more worthwhile than following my pre-conceived list of tasks.

Second, check for updates on the labs page. Some differences are inevitable between the environment in the Software Lab and what works on my machines, and we can't hope to pin down every last one in advance. And as you'll discover, it sometimes takes a lot of thought to work out a reason for the strange behaviours we often see. There's no operating system to intervene and stop your program with a helpful message; and even the seven stars of death only tell you that something is wrong.

When an update is published, there's a procedure for merging it into your working copy using Mercurial. We will help you with that in difficult cases, such as where you've already made one fix in the lab but you are merging in others.

I want to see the Seven Stars of Death. How can I make them appear?

Ok, I suppose we'd better get this over with. Any unexpected interrupt or exception will lead to the seven stars, and you can provoke a HardFault exception with a bx instruction where the target address is even, not having the Thumb bit set. So the two instructions

movs r0, #0
bx r0

ought to do it. You may prefer to find more creative ways of your own. In fact, you probably will.

The detailed mechanism is as follows: the file startup.c contains code that lays out the interrupt vectors for the program, including the vector associated with the HardFault exception. Unless these vectors are redefined elsewhere, they all point to a routine spin that turns off interrupts, (re-)initialises the GPIO pins, and then uses them to display the seven stars, flashing them forever using delay loops. The code is made simpler by the happy fact that the seven LEDs needed are all in the same row of the 3 x 9 wiring matrix. The code is deliberately mashed together in one function (rather than, say, using a subroutine delay), so that it doesn't depend on anything else. Many undefined behaviours of the chip evoke a HardFault exception, and attempting to clear the T bit in the psr using a bx instruction is one of them. The machine saves some registers on the stack, fetches the HardFault vector from address 12, and starts executing the routine at that address, which immediately calls spin.

I stop the echo program with the debugger, then try typing in the minicom window. Nothing happens; and even if I type cont to gdb, still nothing happens. Why?

The debugger can stop your program, but it can't stop other things from happening. In particular, if the UART(Universal Asynchronous Receiver/Transmitter). A peripheral interface that is able to send and receive characters serially, commonly used in the past for communication between a computer and an attached terminal. It is commonly used in ''duplex'' mode, with the transmitter of one device connected to the receiver of the other with one wire, and the receiver of the one connected to transmitter of the other with a different wire. The ''asynchronous'' part of the name refers to the fact that the transmitter and receiver on each wire do not share a common clock, but rely instead on the signalling protocol and precise timing to achieve synchronisation. (the interface connected to the serial port) receives a second character before your program has collected the first one, it signals an 'overrun error' and will do nothing more until the error is cleared. The echo program is written in the simplest way possible, assuming characters will arrive slowly enough for it to fetch each one before the next arrives, and it doesn't have provision for detecting or clearing such errors. When the program is stopped by the debugger, it hasn't a hope of keeping up with your typing. So once you've typed a second character in minicom while the program is stopped, all is lost until you reset the micro:bit. This is a perfect example to show that debuggers are of limited use in embedded systems development.

Does the lab software include drivers for the magnetometer and accelerometer on the micro:bit?

They are on the to-do list. It would be nice to add driver processes to Phōs, both of them clients of a process that drives the I2C bus. If you would like to impelement this, (a) come and do a project with me, and (b) note the following. The I2C implementation on the micro:bit seems quite sensitive to noise, so it will be necessary (unlike, e.g., the very simple UART driver we're using) to detect error situations and try to correct them. In the case of the I2C bus, if a slave device is holding the SDA line low, this may entail bit-bangingThe simulation of a complex protocol by setting the values on GPIO pins under direct program control, with delay loops for timing. For example, serial communications require timings that are more precise that can be achieved by bit-banging, unless the processor is doing nothing else. the clock line until the slave thinks the transaction is complete. Apparently, switching on and off LEDs on the micro:bit creates enough noise on the I2C lines that there's a big risk of hangups; one solution might be to have the I2C driver to ask the display driver process to hold off for the duration of an I2C transaction.

What is the file phos.a that is made in Lab 4?

The file phos.a is a library archive, containing copies of various files of object code such as phos.o and serial.o. The linker treats such libraries specially, linking in only those parts of them that are needed by the program it is linking. Thus, for example, Phōs programs that don't need the timer will be linked without including the driver process for it. Making and using the library simplifies the Makefile while keeping the resulting programs small.

How can I restore the original demo program to my micro:bit?

Download and flash the hex file you can find on the official activities page as 'BBC micro:bit Original Code' under 'You'll need'.

Assembly language

Why write integer constants in hexadecimal for low-level programming?

We often want to work out the binary bits that make up a constant, and hexadecimal notation makes it easy to do so. Each hexadecimal digit corresponds to four bits of the binary representation, and it's not too hard to memorise that bit pattern corresponding to each digit:

0  0000          8  1000
1  0001          9  1001
2  0010          a  1010
3  0011          b  1011
4  0100          c  1100
5  0101          d  1101
6  0110          e  1110
7  0111          f  1111

To convert a larger hexadecimal constant to binary, just concatenate the four-bit groups corresponding to each digit: for example,

0xcafe1812 = 1100 1010 1111 1110 0001 1000 0001 0010.

How can we convert between decimal and hexadecimal?

For quantities that fit in 16 bits or less, it's helpful to recall that the powers of 16 are 1, 16, 256, 4096. Then to convert, say, 1326 to hexadecimal, we note that the highest multiple of 256 that fits is 5 * 256 = 1280. We can set out the calculation as follows.

            1326
5 * 256  =  1280
            ----
              46
2 * 16   =    32
            ----
              14

The remainder, 46, can be expressed as 2 * 16 + 14, and we get 1326 = 5 * 256 + 2 * 16 + 14 * 1 = 0x52e. The final 'e' is a hexadecimal digit with a value of 14.

To convert the other way, we just calculate 0x52e = 5 * 256 + 2 * 16 + 14 = 1326.

What registers can a subroutine use?

It can freely use registers r0 to r3, destroying any values they had when the subroutine was called. The (first four) arguments of the subroutine arrive in those registers, but need not be kept there, and the result can be moved into r0 just before the subroutine returns. Note, however, that if this subroutine calls any others, they in their turn may destroy the values in all four of these registers.

In addition, a subroutine can use registers r4 to r7, provided it leaves them with the same values on exit that they had on entry. An easy way to ensure that is to push them on the stack with a push instruction at the routine start (together with the register lr), and pop them off again with a pop instruction at the end (together with the pc). Any subroutines that are called will also obey this convention, so important values can be left in these registers across calls.

What if a subroutine has more than 4 arguments?

In that case, arguments beyond the first four are passed on the stack. It's the caller's responsibility to put them there, following a layout that is fixed by the machine's calling convention(A near-synonym for ABI). The convention that determines where arguments for a subroutine are to be found, and where the result is returned., and then the subroutine itself can access them using load instructions. I don't think we'll need to write any subroutines with so many arguments (not in assembly language, anyway).

What do I do about labels if I have a big assembly language program with lots of functions and several loops? Can I re-use labels like loop and done in each function?

No, I'm afraid you can't – you can't have two labels with the same name in a single file of code. You can either prefix each label with the name of the enclosing function, as in foo_loop or baz_done, or you can make use of a system of numeric labels that do allow multiple definitions. There can be several labels written 2: in the file, and you refer to them using the notations 2b or 2f, meaning the next occurrence of label 2 going backwards (2b) or forwards (2f) in the file. That saves the effort of inventing names.

Compilers solve the problem in a different way, just inventing their own labels with names like .L392 and never reusing them. That's OK when nobody is expected to read the resulting assembly language.

When I try running my assembly language program with qemu-arm, I get this message: what does it mean?

qemu-arm: ... tb_lock: Assertion `!have_tb_lock' failed.

That's a symptom of the program running haywire, possibly because it has branched to a random location and started executing code there. A probable cause of that is returning from a subroutine with the stack in a bad state, so that the value loaded into the pc is not the correct return address.

What is this .thumb_func directive all about?

Most of the time, the assembler and linker manage to work out for themselves that assembly language code is written using the Thumb encoding, and all works sweetly. But just occasionally (e.g., when writing an interrupt handler in assembly language), it's necessary to mark a function explicitly as being written in Thumb code: in the case of an interrupt handler, so that the interrupt vector can be marked with a 1 bit in the LSB. If this is done properly, then the processor stays in Thumb mode when the interrupt vector is followed. Failing to do this results in an attempt to switch to native ARM mode, and an immediate HardFault exception: the program crashes irretrievably. This seems a bit silly in a device that doesn't support native ARM mode at all, but it is done in the name of software compatibility with bigger devices.

Given that the .thumb_func directive is occasionally needed (at times when it's difficult to determine whether it is actually necessary), and given that it is hard to debug the HardFault that occurs when a needed directive is omitted, it seems simplest just to make a habit of writing .thumb_func in front of every assembly-language function.

I wrote nop in my program, expecting to get the instruction encoded as 0xbf80, but the assembler produced 0x46c0 instead. Why?

Later versions of the ARM architecture implement a specific nop instruction with encoding 0xbf80. On earlier versions, the instruction encoded 0x46c0 is used instead. The rainbow chart decodes that as mov r8, r8, a form of move instruction that can access high registers and (crucially) does not write the condition codesFour bits, @N@, @Z@, @V@ and @C@, in the processor status word that indicate the result of a comparison or other arithmetic operation. Briefly, @N@ indicates whether the result of the operation was negative, @Z@ indicates whether it was zero, @C@ is the value of the carry-out bit from the ALU, and @V@ indicates whether the operation overflowed, yielding a result that was different in sign from what could be predicted from the inputs to the operation. A comparison is treated like a subtraction as far as setting the condition codes is concerned. After the condition codes have been set, a subsequent conditional branch instruction can test them, and make a branch decision based on a boolean combination of their values. All ten arithmetic comparisons (equal, not-equal, and less-than, less-than-or-equal, greater-than, and greater-than-or-equal for both signed and unsigned representations) can be represented in this way. When a process is interrupted, the condition codes must be saved and restored as part of the processor state, in case the interrupt came between a comparison and a subsequent conditional branch.. In practice, there's no significant difference between the two instructions: both last one cycle and have no effect.

What is the difference between the data segment and the bss segment?

The C compiler takes static variables (those that are declared outside any function) and assigns them to one of three segments that are differently treated by the linker. Plain variables such as the array a declared by

int a[10];

are assigned to the bss segment. When the program starts, such variables have no special initial values, or rather, their initial values are zero, false, null. Other variables may be given specific initial values: for example, the array b declared by

int b[10] = { 3, 1, 4, 1, 5, 9, 2, 6, 5, 4 };

These variables are allocated in the data segment, and unlike the bss segment, the program output by the linker contains an image of their initial values. The third segment used for static data is the rodata segment, used for 'variables' that are declared as constant, such as the array c declared by

const int c[10] = { 2, 7, 1, 8, 2, 8, 1, 8, 2, 8 };

These variables do not need space in the RAM, and can be kept in the (more spacious) ROM(Read-Only Memory). A form of storage whose contents are non-volatile (are not lost when the power is off) but cannot be changed under program control. Modern ROM is usually EEPROM -- Electrically Erasable Programmable Read Only Memory, and ''can'' be changed electrically, and even under control of a program running on the microcontroller, but using special peripheral registers and not the normal store instructions. ''Flash memory'' is a modern, super-compact implementation of EEPROM, but for our purposes it does exactly the same job. We will modify the contents of the <span style="font-family: 'Roboto Slab', Ubuntu, sans-serif">micro:bit</span>'s flash memory by downloading programs, but we will probably not be writing programs that change the contents of the flash memory. instead. Such constant tables are common in embedded programs: for example, a lunar lander game might contain a table of the sine of each whole number of degrees from 0 to 360, to save time computing them then the angle of a thruster changes.

Ok, fair enough: but how can I declare similar variables in assembly language?

You can use the directives .data or .bss or .segment rodata to select a segment, a .align directive to pad for alignment, then either .word or .space directives to allocate space. Like this:

    .bss
    .align 2
a:
    .space 40

    .data
    .align 2
b:
    .word 3, 1, 4, 1, 5, 9, 2, 6, 5, 4

    .segment rodata
    .align 2
c:
    .word 2, 7, 1, 8, 2, 8, 1, 8, 2, 8

The directives .data and .bss are just abbreviations for .segment data and .segment bss. If you want a scalar variable instead of an array, just reduce the amount of space from 40 to 4, or give only one word of initialisation data:

    .bss
    .align 2
x:
    .space 4

    .data
    .align 2
y:
    .word 314159

Ok, now I see how to declare global variables. How can I use them?

Let's try writing in assembly language the function that we could express in C as follows.

unsigned g = 4;

unsigned foo(unsigned x, unsigned y) {
    unsigned z = g;

    g = x;

    return y + z;
}

As you can see, this uses a global variable to remember the argument x from one invocation of foo to the next, with the remembered value being 4 initially. In the following interaction, the values printed are 20+4 = 24 and 8+10 = 18, and the memory contains 6 for the next invocation.

Gimme a number: 10
Gimme a number: 20
foo(10, 20) = 24
Gimme a number: 6
Gimme a number: 8
foo(6, 8) = 18

Without further ado, let's try to express the same thing in assembly language. First, we allocate space in the data segment for the global variable.

    .data
    .align 2
g:
    .word 4

In the function foo, we need to get access to g, and to do this, we plant the address of g somewhere in the text segment, and use pc-relative addressing to load the address into a register.

    .text
    .thumb_func
foo:
    ldr r3, [pc, #8]
    ldr r2, [r3]
    str r0, [r3]
    add r0, r1, r2
    bx lr
    .align 2
aaa:
    .word g

The instruction ldr r2, [pc, #8] loads the address g of the global variable into r2, by loading it from a convenient place aaa in the text segment. Then, with the address g in a register, we can use a second ldr instruction to load its value, which sits in the data segment in RAM. The point is that there are two load instructions, one to get the address of the global variable, and the second to get its value. The address is a constant, and can be stored in the ROM at a place where we can address it using pc-relative addressing. The value is variable, and is stored in RAM, at an address that we cannot write as part of the code stream. The address in r3 is also used to write the global variable with the first argument of the function from r0 – then we add together the second argument and the fetched value of g to give the result, putting it in r0 before returning.

We could write the instruction ldr r2, [pc, #8] as ldr r2, aaa to get the assembler to compute the offset 8 for us. Or, better still, write ldr r2, =g to get the assembler to find a 'convenient place' for us, and leave it to put the constant at the end of its output to the text segment, or put the hint .pool after the bx lr (and delete the rest of the source shown aboive from that point on. However we express the source file, the resulting object program is the same: it has a word in RAM, another word in ROM that contains the address of the first word, and a pc-relative load instruction that addresses the second word.

If we want to plant constants using .word directives, or use the .pool directive to give the assembler a hint where to put its own table of literals, where is a safe place to put them?

If we use .word to put constants (such as addresses) into the text segment to access them with pc-relative addressing, then we need to be sure that the machine will not attempt to execute these constants as if they were code. For example, we could put them immediately after the bx lr or pop {pc} that ends a function, because we can be sure the processor will never try to fetch and execute instructions from there. We can put .pool directives in similar places to give the assembler a hint it can place its tables of literals there. If we don't put in these hints, then the assembler will build one big table of literals and put it after the last function in the source file. Usually this is sufficient, because our fragments of assembly language will be small.

At some stage, you promised a micro:bit program written entirely in assembly language: where is it?

The source for (a coming version of) Lab2 contains a program blinky.s that blinks the central LED. Say make blinky.hex to obtain a downloadable hex file.

C programming

What's the difference between the expressions m.m_type and m->m_type?

The first of these is used when m is a structure (record) declared with message m, so that storage for an actual structure is allocated by the declaration; it names the m_type field of the structure.

The second is used when m is a pointer to a message declared elsewhere. We can declare such a pointer with message *m, and if msg is a message declared as in the first part of this answer, we can write &msg to get a pointer to it. The notation m->m_type selects the m_type field of the structure to which m points; it is equivalent to (*m).m_type, because *m is the structure to which m points, and we can use the notation from the first part to select its m_type field. The notations are equivalent in meaning and in efficiency, but m->m_type looks a bit neater.

Your C programs use types like int and unsigned and short. Wouldn't it be more portable to use types like int32_t and int32_t and int16_t that have a well-defined width on all platforms?

Yes, it would: it's just that I'm trying to keep down the number of obscure conventions that are used in the programs. On the ARM, it's safe to assume that int and unsigned are 32 bits.

Which is faster as a way of writing an infinite loop: while (1) ... or for (;;) ...?

This is one of many things connected with minor efficiency that really don't matter. If I were writing a compiler, I would ensure that both of these idioms resulted in exactly the same code, with the loop body followed by an unconditional branch from the bottom back to the top. If by accident gcc generates slightly less tidy code for one form, then we are unlikely to notice the difference. (Similar remarks apply to the idioms n++ and ++n when written as statements.)

According to the standard books about C, the main program should be called main, and yet in this course it is called init. Why is that?

In normal C, the main program ought to have the heading

int main(int argc, char **argv)

reflecting the fact the main receives two parameters, one being the number of command-line arguments (including the program name), and the other being an array of strings, each corresponding to one of the arguments. The return type is int, because the program may terminate by returning an integer, zero for success and some non-zero value for failure.

Our function init takes no parameters and delivers no result:

void init(void)

What's more, once we have an operating system on the chip, then unlike main, when init returns the program is not over, but concurrent execution of the processes in the program is only just beginning.

These differences might be enough to make a change of name desirable, but there are practical reasons too. One is that GCC (un)helpfully signals a warning if a function named main does not have the heading it expects; another is that it treats main specially in the output, assigning it to its own section in the object file so that a linker scriptA text, written in a specialised language, that describes the layout in memory of a program. Compilers typically divide their output into four named sections: @text@ for the program code and embedded constants, @data@ for statically allocated data that has a specified initial value other than zero, @rodata@ for initialised data that is constant, and @bss@ for data that is statically allocated but can initially be filled with zeroes. The linker script may add another section for the program's stack. For microcontrollers, a linker script is needed that puts the @text@ and @rodata@ in Flash, and lays out the RAM so that @stack@ and @bss@ are in separate areas, with the @data@ copied into its own part of RAM from an image held in Flash. can place it in the program image at a specific place. Both of these behaviours are unhelpful to us.

  • Actually, the warning is suppressed by specifying -ffreestanding, which also suppresses other warnings about the types of functions like sprintf. And the separate section is only enabled with the optimisation flag -Os or -O2, not at the -O level we will generally use, mostly to make it easier to interpret program behaviour under a debugger.

You have not explained the implementation of printf. How does it work?

The implementation of printf in any of our programs is split into two parts: the part that is independent of the output device, and renders the data as a sequence of characters, and the device-dependent part that takes each character and transmits it on the output device. The C code is made slightly complicated by two facts: printf takes a variable number of arguments, and the device-independent part is higher-order function that takes the device-dependent part as a parameter.

Let's begin with the code that puts the two parts together:

void serial_printf(char *fmt, ...) {
    va_list va;
    va_start(va, fmt);
    do_print(serial_putc, fmt, va);
    va_end(va);
}

This function takes a variable number of arguments, so we can write a call such as

serial_printf("prime(%d) = %d\n", n, p);

where the one fixed argument – the format string – indicates that there are two additional arguments to come, both integers. The code for serial_printf gets a kind of pointer va to the variable part of the argument list, and passes this pointer, together with the device-dependent output function serial_putc and the format string fmt, to the library routine do_print. Variable argument lists are implemented in different ways on different machines. On the Cortex-M0, the compiler will store the arguments from r0--r3 into the stack frame together with the fifth and later arguments (if any), then use a pointer to the relevant portion of the stack frame for va. What happens on a particular machine is all hidden in the functions va_start and va_end.

I'll simplify the code for do_print a bit: the real version also supports the function sprintf that formats data like printf but saves the resulting string in a character array instead of printing it. Here's the function heading:

void do_print(void (*putch)(char), const char *fmt, va_list va) {

The C syntax is a bit obscure here, but it indicates that do_print takes three arguments:

  • A (pointer to) a function putch that accepts a character as argument and returns nothing.
  • A string fmt, represented as a character pointer.
  • The variable argument list va from do_print's caller.

The body of do_print uses these to do its job.

     char nbuf[NMAX];

     for (const char *p = fmt; *p != 0; p++) {
          if (*p == '%' && *(p+1) != '\0') {
               switch (*++p) {
               case 'c':
                    putch(va_arg(va, int));
                    break;
               case 'd':
                    do_string(putch, itoa(va_arg(va, int), nbuf));
                    break;
               case 's':
                    do_string(putch, va_arg(va, char *));
                    break;
               default:
                    putch(*p);
                    break;
               }
          } else {
               putch(*p);
          }
     }
}     

As you can see, do_print iterates over the format string, using a pointer p to point to the character of that string that it is currently working on. I've shown the implementation of three format codes here: %c for printing a character, %d for printing a decimal integer, and %s for printing a string. Note that you can write %% to print a literal percent sign. The formatting loop works its way through the variable argument list by using the function va_arg (more likely, a macro), and outputs each character by calling the argument function putch. A helper function do_string is used to call putch on each character of a string, such as the one returned by the function itoa (the name is traditional) that converts an integer into a string.

We carefully avoid global variables so that printf can be called from multiple processes simultaneously – it is reentrant, to use the jargon. That's the reason for the array nbuf that is passed to itoa. None of this will be on the exam.

Code in the examples that prints messages on the terminal contains "\r\n" to move to a new line, but other examples of C code that I've seen use just "\n". Why is that?

On an old-fashioned terminal, and on the simulation provided by minicom, the carriage-return character 0x0d (written '\r' in C) is needed to move the cursor or print head back to the left of the page, and the line-feed character 0x0a (written '\n') scrolls up by a line. Normally, the C library or operating system is written so as to insert a '\r' in front of each '\n' output by a C program. But we are working without a library and without an operating system, so we must do this ourselves. (The serial driver in Phōs does insert the '\r' in its version of the serial_putc function.)

If I ask the C compiler to output assembly language (and turn off -g), then most of the file is intelligible, but these lines appear close to the start of the file. What do they mean?

.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1

The lines record the assumptions used by the compiler in generating code: things like whether there is a hardware floating point unit, whether unaligned memory access is allowed, etc. These assumptions will be recorded in a special section of the binary file output by the assembler, and the linker then checks that the assumptions are consistent between different files of code.

C provides bit-fields for access to parts of a memory word. Why don't we use them instead of using shifting and masking operations to program peripheral registers?

On the ARM, most peripheral registers must be accessed as a whole 32-bit word, and can't be used with the ldrb, strb, ldrh, strh instructions that access bytes or 16-bit halfwords. Also, it's sometimes the case that peripheral registers must be accessed exactly once in an operation, or some side-effect may be duplicated. The C compiler doesn't guarantee to respect these conventions when it generates code for bit-field access, so we avoid bit-fields and write (with the help of simplifying macros) explicit shifting and masking code instead. The resulting object code is no worse than would be compiled for bit-fields anyway.

What indentation style do you use for C?

My style is close to the one used in Kernighan and Ritchie, with the notable exception that I put the opening brace of a function on the same line as the function prototype. I don't insist on braces around a single statement controlled by if or for or while, except that the else part of an if must, for balance, have braces if the if part does. Expressions should have parentheses whenever the interpretation is anything but obvious withut them.

I don't accept the argument that without braces, accidental mistakes of indentation can mislead about the scope of an if or while – but I use an editor that reliably gives the correct indentation automatically.

Hardware details

What configuration is needed to use the hardware random number generator?

See Chapter 21 of the hardware reference manual for the NRF51822. The header file hardware.h (in the latest revision of Lab 3) defines symbolic constants such as RNG_VALUE for the device register addresses. If you make a first attempt, we will help you to get it working in the lab sessions.

Can an interrupt handler itself be interrupted?

The interrupt controller (NVIC(Nested Vector Interrupt Controller). An ARM processor component that is able to assign priorities to external interrupts (as opposed to those generated by internally by the processor) and control the servicing of interrupts. As the name indicates, it is able to cope with nested interrupts, where servicing of one interrupt is itself interrupted by another with higher priority. Note that, according to ARM conventions, higher priorities are indicated by lower numbers.) provides a priority scheme, so that the handler for a low-priority interrupt can be interrupted if a request for a higher-priority interrupt comes in: these 'nested' interrupts are the N in NVIC. But we shall not use this scheme, instead giving all interrupts the same priority. In Phōs, interrupt handlers will be very short, and just send a message to an attached driver process, so there is no need for nested interrupts. But whether nesting is allowed or not, it's clear that an interrupt handler cannot be interrupted by a second request the same interrupt.

What happens when the micro:bit boots?

The Nordic chip doesn't boot, in the sense that there is a small program that is responsible for loading a larger one. Instead, we could talk about it "coming out of reset". When the hardware Reset signal is deactivated (for example, by releasing the reset button), the processor fetches the word at address 0 and loads it into the stack pointerA register @sp@ that holds the address of the most recent occupied word of the subroutine stack. On ARM, as on most recent processors, the subroutine stack grows downwards, so that the @sp@ holds the lowest address of any occupied work on the stack., and fetches the word at address 4 (the reset vector) and loads it into the program counterA register that contains the address of the next instruction to be executed. Because of pipelining, on ARM Cortex-M machines, reading the program counter yields a value that is 4 bytes greater than the address of the current instruction.. That is the extent of the hardware's responsibility.

On the software side, the C compiler, the linker, the linker script NRF51822.ld and the C functions in startup.c work together to determine what happens next. The initial stack pointer is set to the top of RAM, and the reset vector points to the routine __reset in startup.c. The contents of the RAM at this point are meaningless, and may be random. Here's the code for __reset:

/* __reset -- the system starts here */
void __reset(void) {
     unsigned *p, *q;

     /* Make sure all RAM banks are powered on */
     POWER_RAMON |= BIT(0) | BIT(1);

     /* Copy data segment and zero out bss */
     p = __data_start; q = __etext;
     while (p < __data_end) *p++ = *q++;
     p = __bss_start;
     while (p < __bss_end) *p++ = 0;

     /* Initialise the scheduler */
     phos_init();

     /* Let the program initialise itself */
     init();

     /* Start the scheduler */
     phos_start();

     /* If there is no scheduler, spin */
     while (1) pause();
}

After ensuring (in case someone has switched them off) that all parts of the RAM are powered, this initialised the portion of the RAM that is devoted to static variables. The linker has made all the code for the program refer to locations in the bottom of the RAM, but the data which ought to appear there is not yet present. Instead, the linker has (folowing directions in the file NRF51822.ld) put an image of the data segment in the ROM just after the program code. It has also defined symbols like __data_start, __bss_start and __bss_end that we can use here. The code in __reset copies the image of the data segment word by word into the bottom of RAM, then zeroes out the bss segment. From this point on, the program can assume that static variables have their proper initial values.

There follow three calls to other subroutines. In the programs we write in the first part of the course, two of these subroutines, phos_init and phos_start do nothing, and init is the main program, responsible for all its activity. As you will see, if init ever returns, the program falls into an infinite loop, forever doing nothing; interrupt handlers remain active during this loop. Later in the course, we will start to include the Phōs operating system in our programs, and then phos_init will set up its data structures, the main program init will just create a number of operating system processes, and phos_start is the scheduler that interleaves execution of the processes, which work together to carry out the program's tasks. The scheduler never returns.

What is the difference between the I/O registers UART_INTEN, UART_INTENSET and UART_INTENCLR?

The UART_INTEN register shows which events in the UART can cause an interrupt: you can write a value to the register to enable or disable individual interrupt sources, and you can read it to find out which sources are actually enabled. The Nordic chips have a scheme – mostly for convenience – that allows individual bits in this register to be set or cleared without having to read the register, modify it, and write back the result. If you write a value to the UART_INTENSET register containing one or more 1 bits, then the corresponding bits in UART_INTEN will be set to 1, and the other bits will remain unchanged. Similarly, writing a pattern of 1 bits to UART_INTENCLR clears the corresponding bits of UART_INTEN and leaves the others unchanged. The general scheme is described in Section 10.1.2 of the nRF51822 Reference Manual.

About Phōs

Why can't one Phōs processes start another?

In short, because that would be a complication that would add little utility. The memory allocation in Phōs is deliberately simple, stack space for each process being nibbled away from one end of a single chunk of free space. That works well if a fixed number of processes are started when the program begins, but much less well if processes come and (by implication) go during the life of the system. If processes exit and restart, we will need to re-use the storage space of dead processes to find room for new ones; and because the hardware doesn't allow us to move processes around, we then face the problem of fragmentation of storage – all too messy to contemplate. Given that the operating system is only going to support a fixed collection of processes, we may as well have those processes created as part of initialisation, before the scheduler takes control.

When a process makes a system call such as receive(ANY, &m), how does the operating system access the arguments?

The arguments are in registers r0 and r1 before the svc instruction that enters the operating system. The handler for the SVC interrupt gets the the stack pointer for the process after context has been saved onto the stack. It can then reach into it (in a machine-dependent way) to discover what the values of r0 and r1 were at the time of the call. It also uses the saved pc value to look at the svc instruction, which contains an integer constant that identifies which system call – receive or send or another – was made. Given the address of the message buffer m, the operating system can read or write the memory belonging to the process in order to copy a message from sender to receiver.

What happens if a process loops forever, consuming CPU time, without ever calling send or receive?

Scheduling is non-preemptive, so such once such a process is running, it will stay running until the next interrupt, and other processes will not run until then. Phōs is not really designed to share the processor fairly among processes that always want to run, because doing that really needs a clock interrupt to be directly connected to the scheduler in order to measure out time slices, and we'd like to make use of timers optional. The assumption is that CPU time is a plentiful resource, so that fairness in sharing it doesn't matter. In the heart example, it works well to give the process driving the display a higher priority than the one searching for primes, so that after a timer interrupt it get to run first.

If a Phōs process calls a subroutine (like primes calling serial_printf which subsequently calls serial_putc) and the subroutine calls send to send a message, how does the operating system know which process sent the message?

Phōs always knows which processs is currently running. The stack frames for primes and serial_printf and serial_putc are all part of the stack of that process. So when the call to send is made, the operating system knows that it is made on behalf of the PRIMES process. The same subroutine serial_putc could be called at another time by another process, and then is would generate a message from that process – whichever one was current at the time.

You can use receive(ANY, &msg) to receive a message from anywhere. Why not send(ANY, &msg) to send a message to any receiving process?

Practically speaking, because that's useless: we often want to write a server process that can get a request from any client, carry it out, then send a reply. This is supported by receive(ANY, &msg) followed by a reply sent to msg.m_sender. An attempt to send to any process at all is not likely to succeed, because the receiving process is quite likely not to be interested in the message. (Theoretically speaking, the dual of receive-from-any ought to be send-to-all, and that is no more likely to be a useful operation.)

Why is the stack space of a Phōs process filled with the value 3735928559 when the process starts?

This helps with finding out how much stack space the process is actually using: when Phōs prints a dump of the process table, it shows how much of the stack space of the process has been changed from this initial value, and we can assume that this is the total stack space that has been used, unless by some slim chance the last few words of the stack have been set to this magic value by the process itself. As to the choice of value, it appears in the Phōs source code in hexadecimal notation, which makes it seem more significant.

In Phōs, what happens if an interrupt is triggered when the attached process is not waiting to receive a message?

The process table entry for each process has space to record one pending interrupt message, so if an interrupt arrives and the process is not waiting for it, the message can be delivered later. The default interrupt handler, which converts interrupts to messages, also disables the interrupt that invoked it, because the interrupts on the micro:bit continue to be triggered if the cause is not removed, and removing the cause (such as an idle UART) requires hardware-specific actions from the driver process. So a second interrupt cannot happen until the driver process has run to re-enable the interrupt. If this behaviour is inadequate to the situation, it's possible to replace the default handler with another one for a specific device. It's a bit difficult to configure this without editing the Phōs source code, but the infrastructure is there to support it.

What happens if a device driver process does not finish responding to one interrupt message before another interrupt occurs?

As noted above, the default action of disabling the interrupt means that a second interrupt can't actually be requested. But if (for example) the interrupt comes from a timer, and interrupts are being lost causing the clock to run slowly, then the right thing is to move some work from the driver process to the interrupt handler, and modify it so that the interrupt remains enabled all the time.

In the UART driver process, code for input and code for output are intertwined. Wouldn't it be neater to untangle them and have separate driver processes for the two sides of the UART?

In some ways, yes. But the vital deciding factor in favour of a single driver is that the two sides of the UART share a single interrupt request, so interrupts for both sending and receiving turn into messages to a single process. We could use a more elaborate scheme where a little router process passed on interrupt messages to one or other of two driver processes, but we would be adding as much complexity as we could hope to remove. (A minor additional point is that echoing of input characters is easy to do if input and output share a driver process.)

If message-passing makes programming so much easier, why do widely-used embedded operating systems provide semaphores instead?

Programs written using semaphores have the potential to be faster than those written using message passing and server processes, because context switches from one process to another can be made much less frequent. For example, to buffer and print characters on the serial port, our message-based design requires about four context switches per character sent, in addition to the running of the UART interrupt handler. There is a context switch to send a request message to the UART driver process and put the character in the buffer, another to return to the main program, then two further context switches into and out of the driver process when a UART interrupt prompts it to send the character. By way of contrast, a semaphore-based solution might need no context switches at all, if characters are retrieved from the buffer and sent to the UART directly in the interrupt handler. This is because a down operation on an available semaphore only needs to disable interrupts for a moment to test and lock the semaphore, and need not block the invoking process. It's only when a process needs to block on a semaphore that a context switch is needed.

On the other hand, message-based concurrency is much easier to think about, and it makes it easier to avoid problems like priority inversion, where a high-priority process is prevented from making progress because it needs to lock a semaphore currently held by a low-priority process.

Why does Phōs not have (i) pre-emptive schedulingA form of process scheduling where a process may be suspended when it has run for a certain time, or a process with higher priority becomes ready. Without pre-emptive scheduling, processes are only suspended when they wait for an event., (ii) nested interrupt handlers, (iii) an interruptible kernel?

In a simple, interrupt-driven system that is not overloaded, most of the time every process is blocked waiting for something to happen, and the idle process is 'active', in the weak sense that it is the running process, but it has put the processor to sleep with a wait-for-interrupt instruction (wfe on the ARM). When an interrupt arrives, processes wake up, there is a flurry of activity, and then the system settles down to sleep again, and the time needed for that to happen is bounded. If there is no response signal due from the system as a result of an interrupt that must be produced in less than this time-bound, then there is no need for pre-emptive scheduling (which does not change the total amount of work to be done, but just affects the order in which to tasks are carried out). And if we keep interrupt times short by doing no more than sending a message in each interrupt handler, then there will be little need to allow the running of one interrupt handler to be suspended in order to tun another. Likewise, if the time spent in delivering a message is short, it won't matter much if an interrupt is delayed until the kernel has finished.

There are well-developed theories of process scheduling that aim to ensure that complex sets of real-time deadlines are met using limited resources, and in circumstances where they are needed, these theories are really valuable. In simple cases, however, process scheduling is hard to get wrong, in the sense that if every process waiting for an event gets to run after the event has happened and before the system goes to sleep again, then everything will happen when it should and in good time. Phōs is designed with these modest goals in mind.

Why this Phōs operating system, and not another, industry standard real-time operating system such as FreeRTOS?

FreeRTOS is OK, and provides a rag-bag of facilities for multi-tasking, including processes, semaphores of different kinds, message queues, timers, and a lot more. It is quite complicated, and everything seems to have lots of options that interact in subtle ways. For the course, I wanted to find out how far we could get with a much simpler scheme, where there are no nested interrupts, possibly no preemptive scheduling, and the system is structured into a family of processes communicating by messages. As you will discover in Concurrent Programming next year, programming with semaphores is a complicated business; and next year will be soon enough for that. And I can't resist saying that the coding style of FreeRTOS gives me the creeps.

Where does the name Phōs come from?

Phōs (ΦΩΣ) is the Greek word for 'light', and also the first word in Greek of the ancient hymn known as Phos Hilaron (Hail, gladdening Light), particularly associated with John Keble, formerly Fellow of Oriel, in memory of whom Keble College was named by its founder, Edward Pusey (also at one time a Fellow of Oriel). There is also an echo of the motto of the University, Dominus illuminatio mea (The Lord is my light), which consists of the first words of Psalm 27. The macron above the ō in transliteration is intended to show that the letter in Greek is omega rather than omicron, so that the vowel should be pronounced like those in 'photo', not as 'foss'. No allusion is intended to Phở, the Vietnamese noodle soup, though one can see a Phōs application as a collection of noodly threads swimming in a nutritous broth.

Digital electronics

What are all the numbers shown on an oscilloscope trace?

Here is a trace of a character being sent on a serial line.

RS232 waveform
RS232 waveform
  • In the top left, you can see (STOP) that acquisition of the signal has been stopped, presumably so that I can save a screenshot; at other times, the same label shows WAIT (when the scope is waiting for a trigger event) or TRIG'D (when an acquisition is in progress).
  • Next to it (100us/) we see that the horizontal scale is 100 microseconds per division. The clock period for the waveform can be seen from the trace to be a bit more than 100us, because the transitions fall a bit behind the grid of dotted divisions towards the right of the picture.
  • At the top right, we see that the oscilloscope's trigger is detecting a falling edge on channel 1 and set at a level of 1.72V, midway between the high of 3.3V and the low of 0V.
  • In the left margin, you can see the trigger level marked (T), as well as the 0V level for channel 1.
  • At the bottom left, you can see that channel 1 is DC-coupled, and has a scale of 1.00V per division; the difference between high and low is therefore seen on the trace to be about 3.3V.
  • On the black background just below the top margin, you can see at the left a T mark denoting the trigger point. I've adjusted this to appear towards the left of the display, so that we can see subsequent events, but as the outline at centre top shows, there's plenty more waveform captured in this single-shot mode before and after the trigger that we could see by scrolling.

How do you save a screenshot on the Agilent scope without the Save menu showing?

[This is one of the few FAQs here that really is Frequently Asked: by Mike, to himself].

  1. Press the Save/Recall button.
  2. Set Storage to PNG.
  3. Press External.
  4. Press New File.
  5. Press the Menu On/Off button.
  6. Press the fourth bezel button.
  7. Press Save (the same button again).

It usually works. Always? Not sure. One place in the Rigol firmware that Agilent didn't clean up?

Another glitch: all the files on the USB pen are dated 1 Jan 2006, so Gnome tends to think its thumbnails are up to date even when they aren't.

Why do wired-or busses like I2C have the convention that signal lines are active-low?

Electrically, active-low busses are better in two ways: one, N-type MOSFETs are generally better behaved than P-types, and specifically tend to have a lower on resistance. This makes them better able to pull the signal line low crisply (when it may have significant capacitance) than a P-type could pull it high. In older systems an NPN transistor in an open-collector outputA gate output that consists of a single transistor connected between the output and ground. When the gate is active, it can sink current and pull the output low, but when it is inactive it does not prevent other outputs connected to the same wire from pulling it low. Open collector outputs are commonly used with a separate pull-up resistor connected between the wire and the positive power rail. stage had a similar advantage over a PNP transistor, and for the same physical reason – that the majority carriers in these transistors are electrons, which have a better motility than holes. The second reason for using active-low signalling is that a bus may cross between circuits that are using different power supplies, and ground is then used as a common reference, whereas the power rails may differ a little or a lot in voltage. An active-low signal can have threshold voltages defined with respect to ground, be pulled low by an open-drain outputA gate output that consists of a single transistor connected between the output and ground. When the gate is active, it can sink current and pull the output low, but when it is inactive it does not prevent other outputs connected to the same wire from pulling it low. Open collector outputs are commonly used with a separate pull-up resistor connected between the wire and the positive power rail. with short fall time, and draw little or no quiescent current through its pullup resistor.

Software tools

I keep getting the following message when I try to start minicom. What gives?

minicom: cannot open /dev/ttyACM0: Device or resource busy

I found the second answer to this Stack Overflow question helpful: https://stackoverflow.com/questions/40951728.

Why are the C compiler and other tools called by names like arm-none-eabi-gcc and arm-none-eabi-objdump?

These long names are chosen to distinguish these cross-compiling tools from native tools like gcc, which on a Linux PC is a compiler that translates C into code for an Intel processor. The lengthy names make explicit that the compiler generates ARM code (arm) for a machine running no operating system (none), and uses a standard set of subroutine calling conventions called the Extended Application Binary Interface (eabi). This explicitness is needed, because a Linux machine might also have installed compilers for the Raspberry Pi (gcc-arm-linux-gnueabihf) and maybe AVR microcontrollers (gcc-avr) and the MIPS (mipsel-linux-gnu-gcc). As you can see, the naming conventions aren't quite consistent between GCC cross-compiler projects.

Trivia

Why use the ARM? Why not the x86 or x86_64 we already have in our laptops?

I wanted us to program a machine where all the code was accessible to our inspection. That's easy to achieve on a microcontrollerA single integrated circuit that contains a microprocessor together with some memory (usually both RAM for dynamic state and ROM for storing a persistent program) and peripheral interfaces., and as simple as it can be on an ARM-based machine. Yes, we could program x86 machines in assembly language, though we would have to face some unavoidable complexity that has accrued during the 30 or more year history of the architecture. But we wouldn't easily get to program the bare machine and understand everything that happens.

Why C? Why not C++? Or why not Python/Java/Scala/Haskell/Some other higher level language?

Quite a lot of embedded programming is done in C++ these days, and in fact the official software support for the micro:bit is in C++. This gives a number of advantages: peripheral devices can be initialised by declaring a static instance of the driver class: PinOut led(PIN3); and C++'s overloading feature subsequently lets you write led = 1; and led = 0; to turn on and off the LED, even if the actual code needed is something more complicated, possibly involving a function call. This is all very well for applications programming, but it really gets in the way of understanding what is happening at the machine level.

Also, for C++, as for other high level languages, we would need to know how the high-level constructs are implemented in order to see how a high level program corresponds to a sequence of machine instructions. For C, this is more straightforward, and we can use C as a quick, clean and portable way of writing what we could also write in assembly language. And in any case, nobody's employment prospects were ever hurt by saying that they knew how to program in C.

Why bother with assembly language if we can write all our programs in C?

Several reasons: first, our aim here is to give a complete account, spanning all levels of abstraction, of the functioning of a small computer system. We're going to take for granted a bit the process of translating high level programs into machine code (more on that next year), but we do need to understand machine language as the interface between hardware and software. Second, when we come to implement an operating system, we will need to understand clearly what it means to save the state of one process and restore the state of another, and that can only be seen at the level of machine code. And third, even if writing assembly language is something you never need to do (especially for the ARM with Thumb code), it's a really useful skill to be able to read the assembly language output by a compiler to see what is happening. When you've used instrumentation to identify the speed-determining part of your big program and done your best to write it in high-level code that will run fast, it's good to be able to look at the compiler output and see if it is sensible.

Is it "twos' complement" or "two's complement" or "twos complement"?

It's "twos complement". To see why this is right, consider that the ones complement of a number is that quantity that, when added to the number, results in a string of ones:

 00101011
 11010100
----------
 11111111

The twos complement is obtained by adding 1 to the ones complement: when this is added to the original number, the result in each column is two – that is, a zero with a carry of 1:

 00101011
 11010101
----------
100000000

So the twos complement of a number is that quantity that, added to the number, results in a string of twos. You could make case for the reading "twos' complement" – the complement belonging to a string of 2s – but "two's complement" is clearly wrong: that would be the complement belonging to two, rather than to a string of twos. You can't form the plural of two in that way.

You keep saying the micro:bit has 16kB (kilobytes) of RAM. That means 16,000 bytes: isn't it true that there is actually 16KiB (kibibytes) of RAM, or 16,384 bytes?

Yes, that's right. You can collect your refund on the way out. Please note as you go that it's correct to put a thin space between a number and a following unit, as in 16 KiB, and that a thin space should also be used in place of a comma to separate thousands, as in 16 000 bytes, to avoid confusion with the use of a comma as a decimal marker in some countries. Glad we sorted that out.

What fonts do you use on your website?

On webfont-enabled browsers, running text is in Roboto Sans, and titles in Roboto Slab. Code samples are set in Roboto Mono, and the occasional scary word is set in Unifraktur Cook, just for fun.

Personal tools

Variants
Actions
Navigation
Tools