Lecture 21 – Designing a datapath (Digital Systems)

From Spivey's Corner
Jump to: navigation, search

Note: The handout for this lecture (and likely the next one or two) is also provided as a separate, printable document. The content is the same, but the presentation has been edited for printing.

The plan now is to put together the elements we have studied into a simple datapath that can execute Thumb instructions. We'll do it in stages, adding at each stage just sufficient logic to implement a few more instructions. The design we make will be seriously unrealistic, in that all the work of executing an instruction will be performed inside a single clock cycle: this will lead to longer combinational paths than we would like, and so require a lower clock speed. A more practical design would use pipelining to overlap the execution of each instruction with preparation for the next one. You can study pipelining and the design questions it raises in next year's course on Computer Architecture. For now, we must be content with the observation that if we wanted a pipelined implementation of the Thumb architecture, then effort spent on this single-cycle implementation would not be wasted, because a pipelined design starts with a single-cycle design, drawing lines across the circuit to separate what happens for a particular instruction in this clock cycle from what happens in the next.

The design is shown in these notes by means of a sequence of circuit diagrams, each accompanied by a selection of settings for the control signals that correspond to instructions that the circuit is capable of executing. The final design is also represented by a register-level simulator written in C. Source code for this simulator is available as part of the (utterly optional) Lab five, and an annotated listing is also provided as a PDF document. The simulator contains tables that decode all the instructions it implements, and is capable of loading and executing binary images prepared using the standard Thumb tools.

Stage 1: Instruction fetch[edit]

The first stage is to arrange to fetch a stream of instructions from memory, and decode them into control signals that will drive the rest of the datapath. For this, let's install a program counter PC, a register that will, on each clock cycle, feed the address of the current instruction to a memory unit IMem so that it fetches a 16-bit instruction. There's also a simplified adder that increments the PC value by 2 and feeds it back into the PC as the address of the next instruction. Some of the 16 bits of the instruction are fed into a combinational circuit that decodes it, producing a bundle of control signals that are fed to the functional units in the datapath. Since those functional units and their connections are yet to be added, we can't say precisely what the control signals are at this stage; but let's add wiring that makes both the control signals and the remaining bits of the instruction (those that have not been accounted for in the encoding) available to each part of the datapath. We can imagine building the decoder from a ROM or (as we'll see later) several ROMs that decode different parts of the instruction.

Stage 1: Instruction fetch

This design is capable of implementing only straight-line programs with no branching, because there is no way to avoid a sequence of PC values that goes 0, 2, 4, 6, ... Also, this design doesn't reflect the fact that instructions can access the PC alongside the other 15 registers. We'll adjust the design later to correct both these infelicities.

Stage 2: ALU operations[edit]

Now let's add some datapath components: a simple register file and an ALU. The twin-port register file is capable of reading the values of two registers and writing a result to a third register, all in the same clock cycle. We can imagine for now that the three registers are selected by fixed fields in the instruction, as they are in the add reg instruction:

Adds-rrr format.png

In executing this instruction, the two registers that are read are selected by fields instr<5:3> and instr<8:6> of the instruction. The control unit must ask the ALU to add its two inputs, producing a result that is fed back to the register file. The control unit also tells the register file to write the result back into the register selected by instr<2:0>.

Stage 2: ALU operations

The same datapath could be used to implement other instructions that perform arithmetic on registers: the three-register form of sub, certainly:

Subs-rrr format.png

For this instruction, we need to tell the ALU to subtract rather than add. But we can also implement instructions like ands that specify two registers and overwrite one of their operands:

Ands-rr format.png

This instruction is a bit different, because the ALU must do a bitwise and operation, but also the three registers are selected by different fields of the instruction, with instr<2:0> used to select both one of the inputs and the output of the instruction. Let's leave aside for a while these issues of detail in decoding, and concentrate instead on what features are needed in the datapath itself.

Stage 3: Immediate operands[edit]

In addition to instructions that perform ALU operations with the operands taken from registers, there are also instructions that take the second operand from a field in the instruction. Examples of this are two forms of the add instruction:

Adds-rri format.png
Adds-ri format.png

We can also cover the immediate form of the mov instruction, if we allow an ALU operation that simply copies the second operand and ignores the first.

Movs-ri format.png

To accommodate these, we can introduce a multiplexer on the second input of the ALU, fed both from the second register value rb and from appropriate fields of the instruction. The examples above show that it must provide the option of selecting both instr<8:6> and instr<7:0>, and there are other possibilities we will discover as we proceed.

Stage 3: Immediate operands

Now we have a few control signals, we can start to make a table showing how they should be set to carry out various instructions.

Instruction cRand2 cAluOp cRegWrite
adds r RegB Add T
adds i3 Imm3 Add T
adds i8 Imm8 Add T
subs r RegB Sub T
subs i3 Imm3 Sub T
subs i8 Imm8 Sub T
ands r RegB And T
movs r RegB Mov T
movs i8 Imm8 Mov T

This table will expand as we get further, adding more rows to provide more instructions, but also more columns to control the extra hardware we will need to implement them. There will always be settings for any added control signals that make the new hardware do nothing, so that we can still get the effect of these early instructions unchanged. We've still to deal with the issue of what registers are selected to take part in the instructions, and we have also yet to provide for the fact these instructions all set the condition codes. And so far, all instructions write their result into a register: that will change too.

Stage 4: Data memory[edit]

Now let's add a second interface to memory, so that we can implement load and store instructions. We'll use what's called a modified Harvard architecture, meaning that the data memory will be treated separately from the instruction memory. They could be separate memories, as on some microcontrollers like the PIC, or we could imagine having two independent caches in front of the same memory, and modelling just the things that happen when there is a cache hit all the time, not the periods of suspended animation when the processor core is waiting for a memory transaction to complete. Either way, this is different from the von Neumann architecture of ARM's Cortex-M0 implementation, where there is one memory interface, and loads and stores are executed in an extra cycle between instruction fetches.

The Thumb instruction set provides instructions like ldr r0, [r1, r2] and str r0, [r1, r2] that form an address by adding the contents of two registers r1 and r2, and either load a memory word and save it in a third register r0, or take a value from r0 and store it.

Ldr-rrr format.png
Str-rrr format.png

We should notice two requirements for the datapath: first, that we need to form addresses by adding, and second, that the str instruction here reads not two but all three of the registers named in the instruction. We can use the existing ALU to do the address calculation, and we can easily enhance the register file with an extra mux so that it outputs the current value of all three registers named in the instruction. If the third register value isn't needed (as in all instructions except a store) then it costs nothing to compute it anyway.

Stage 4: Data memory

In addition to the third register value rc, the diagram shows two further architectural elements. There's the data memory |DMem|, with two data inputs, one data output and two control inputs, both single bits. The two data inputs are an address, taken from the ALU output, and a value to be stored, taken from rc. The data output is a value memout that has been loaded from memory, and this together with aluout feeds a new mux that determines the result of the instruction. The two control inputs for the memory are cMemRd and cMemWr, telling it whether to conduct a read cycle, a write cycle, or (if the instruction is not a load or store) neither. Writing when we don't want to write is obviously harmful, and reading unnecessarily might also be harmful if it causes a cache miss, or an exception for an invalid address. The result mux can be controlled by the same cMemRd signal, so that the result of the instruction is memout for load instructions and aluout for everything else.

Let's enhance the decoding table to cover these two new instructions. I'll keep just a few of the existing instructions, extending the lines for them to include values for cMemRd = cMemWr = F that maintain the same function as before; the other instructions can be extended in the same way. I've added entries for the ldr and str with the reg+reg addressing mode. Note that str is the first instruction that doesn't write its result back to a register. There will be others, so while it was tempting before to suppose cRegWrite = T always, and it's tempting now to suppose cRegWrite = !cMemWr, we will see later that neither of these are true.

Instruction cRand2 cAluOp cMemRd cMemWr cRegWrite
adds r RegB Add F F T
movs i8 Imm8 Mov F F T
ldr r RegB Add T F T
str r RegB Add F T F

Stage 5: Barrel shifter[edit]

As the next step, let's add a barrel shifter to the datapath, so that we can implement shift instructions like the following.

Lsls-rri format.png
Rors-rr format.png

We could make a barrel shifter part of the ALU, so that left and right shifts were added to the list of ALU operations; or failing that, we could put the shifter 'in parallel' with the ALU, feeding its output together with those of the ALU and the data memory into the result mux. That would allow us to implement the shift instructions OK, but it would be less versatile. We can take a glance at big-ARM instructions like

ldr r0, [r1, r2, LSL #2].

This shifts left by 2 bits the value of r2, adds that to the value of r1 to form an address, loads from that address, and puts the result in r0, all in one instruction. This is really useful if r1 contains the base address of an array and r2 contains an index into the array. Sadly, Thumb code doesn't have a way to encode all that in one instruction. We can provide for such operations by adding a barrel shifter in front of the ALU, operating on the value that will become the ALU's second input, as shown in the figure. There is a control signal to set the operation – Lsl, Lsr, Asr or Ror – to be performed by the shifter. There's also a mux that lets us choose the shift amount, either a constant like 0 or 2 (Sh0 or Sh2), or an immediate field of the instruction (ShImm), or a value taken from the first register ra read by the instruction (ShReg).

Stage 5: Barrel shifter

There are two new control signals, cShiftOp and cShiftAmt. Existing instructions will continue to work if we set cShiftOp = Lsl and cShiftAmt = Sh0, representing the constant 0. We can make good use of the shifter in implementing load and store instructions with the reg+imm addressing mode, because it is specified that the offset should be multiplied by 4 in such instructions, and we can get the shifter to do the multiplication.

Ldr-rri format.png
Str-rri format.png

Here are control signals for some existing instructions, plus the two shift instructions and the reg+imm forms of ldr and str.

Instruction cRand2 cShiftOp cShiftAmt cAluOp cMemRd cMemWr cRegWrite
adds r RegB Lsl Sh0 Add F F T
movs i8 Imm8 Lsl Sh0 Mov F F T
ldr r RegB Lsl Sh0 Add T F T
str r RegB Lsl Sh0 Add F T F
lsls i5 RegB Lsl ShImm Mov F F T
rors r RegB Ror ShReg Mov F F T
ldr i5 Imm5 Lsl Sh2 Add T F T
str i5 Imm5 Lsl Sh2 Add F T F

The disadvantage of putting the barrel shifter 'in series' with the ALU is that it lengthens the combinational paths, one of which now stretches from the register file, through shifter, ALU and data memory, back to the register file. The long path will slow the maximum clock rate that can be supported by the machine.

Stage 6: PC as a register[edit]

Up to now, we have kept the PC separate from the general register files, as indeed it is on some architectures. But on the ARM, the PC can be accessed like other registers, and used in PC-relative addressing. So our next step is to merge the PC into the register file, using a design for a 'turbocharged' register file we prepared earlier.

Stage 6: Turbocharged register file

In addition to the three selectable outputs that can output the value of any register (including the PC), there is now a special-purpose output that always carries the (current) PC value. There is also a separate input that receives the value of PC+2, which will be written to the PC if it is not specifically selected for the writing of a different value: this allows for the implementation of branch instructions at a later point. We'll assume that the design of the register file includes an adjustment so that, when the PC is read as an ordinary register, the specified value PC+4 is output.

Stage 7: Instruction decoding[edit]

We've got quite a good table of control signals now, so it's time to fill in more details of the decoding process. Each instruction uses up to three registers, sometimes selected by fields in the instruction, and sometimes fixed as SP or LR or PC. To sort this out, we can add three identical multiplexers, driven by control signals cRegSelA, cRegSelB, cRegSelC, that either select one of a list of fields (Rd, Rm, Rn, Rt) from the instruction or give a fixed value (Rsp, Rlr, Rpc).

Stage 7: Instruction decoding

The other addition in this figure is a unit called alusel. This tidies up a couple of instructions where the ALU operation could be add or subtract, but the decision is not determined by the first few bits of the instruction. One pair of such instructions are the ones that add or subtract a constant from the stack pointer, using instr<7> to decide which.

Add-spi format.png
Sub-spi format.png

The other such pair are the forms add/sub r/i3 shown earlier that use bit instr<10> to decide whether the second operand is a register or a 3-bit immediate field, and use instr<9> to decide between adding and subtracting. The alusel decoder sorts out the details, and the details can be found in the code of the simulator.

Let's refresh the table of control signals, adding the three register selectors, and also adding at the left a column that shows the leftmost bits of the instruction, always bits instr<15:11> and sometimes further bits. We'll do that for our selection of example instructions first.

Instruction opcode cRegSelA cRegSelB cRegSelC cRand2 cShiftOp cShiftAmt cAluSel cMemRd cMemWr cRegWrite
adds/subs r/i3 00011 Rn Rm Rd RImm3 Lsl Sh0 Sg9 F F T
movs i8 00100 - - Rt Imm8 Lsl Sh0 Mov F F T
ldr r 01011 Rn Rm Rd RegB Lsl Sh0 Add T F T
str r 01010 Rn Rm Rd RegB Lsl Sh0 Add F T F
lsls i5 00000 - Rn Rd RegB Lsl ShImm Mov F F T
ands r 01000:00000 Rd Rn Rd RegB Lsl Sh0 And F F T
rors r 01000:00111 Rn Rd Rd RegB Ror ShReg Mov F F T
ldr i5 01101 Rn - Rd Imm5 Lsl Sh2 Add T F T
str i5 01100 Rn - Rd Imm5 Lsl Sh2 Add F T F

This is a fair selection of instructions for the machine, especially if we let ands stand as an example of register-to-register ALU operations and rors stand as an example of shifts with the shift amount taken from a register. Most instructions can be identified from their first five bits and set out in a table of 32 possibilities. Among those implemented in the simulator, most of the rest start with 010000 or 010001, and can be identified using two further, smaller tables. All these tables could become ROMs in a hardware implementation. This part of a Thumb implementation is more complicated than is typical for RISC machines because of the variety of instruction formats. For comparison, the MIPS has just three instruction formats, all 32 bits long: one that names three registers, one that has two registers and a 16-bit immediate field, and a third format with a large offset for subroutine calls.

The datapath as it stands also contains the resources to implement a number of other instructions. For example, there is are several instructions that implicitly involve the stack pointer, including the two shown above, an instruction that forms an address by adding a constant and the stack pointer, and instructions that load and store from that address.

Add-rspi format.png
Ldr-rspi format.png
Str-rspi format.png

All of these can be implemented using the register selector Rsp:

Instruction opcode cRegSelA cRegSelB cRegSelC cRand2 cShiftOp cShiftAmt cAluSel cMemRd cMemWr cRegWrite
add/sub sp 10110 Rsp - Rsp Imm7 Lsl Sh2 Sg7 F F T
add rsp 10101 Rsp - Rt Imm8 Lsl Sh2 Add F F T
ldr sp 10011 Rsp - Rt Imm8 Lsl Sh2 Add T F T
str sp 10010 Rsp - Rt Imm8 Lsl Sh2 Add F T F

We can also implement unconditional branches that add a signed 11-bit constant to the PC.

Instruction opcode cRegSelA cRegSelB cRegSelC cRand2 cShiftOp cShiftAmt cAluSel cMemRd cMemWr cRegWrite
b 11100 Rpc - Rpc SImm11 Lsl Sh1 Add F F T

As you can see, the displacement is multiplied by 2 before adding it to the PC. This implementation depends on two features of the register file: that the PC reads as PC+4 when accessed as a numbered register, and that writing the PC explicitly takes precedence over the usual update with nextpc = pc+2.

Stage 8: Subroutine calls[edit]

In order to implement the branch-and-link instructions bl and blx, we need one extra feature of the register file, and one small enhancement to the datapath. The register file has one further control input cLink that, when active, causes the nextpc value to be written to the link register LR, in addition to the normal updating of registers. This will permit us to implement an instruction that simultaneously sets the link register to a return address while loading the entry point of a subroutine into the PC.

Because the bx r and blx r instructions differ in only one bit, we need an extra multiplexer to derive this control signal, with three settings – 0 for most instructions, 1 for the bl2 instruction (see below), and a copy of instr<7> for these instructions.

Bx-r format.png
Blx-r format.png

The three values of the controlling cWLink control signal are denoted N, Y and C in the tables below.

Stage 8: Subroutine calls

Existing instructions are extended with cWLink = N, and the following rule covers the two branch-to-register instructions. The RHn register selector denotes the four-bit field Instr<6:3>; high registers are allowed here, as in the familiar instruction bx lr.

Instruction opcode cRegSelA cRegSelB cRegSelC cRand2 cShiftOp cShiftAmt cAluSel cMemRd cMemWr cRegWrite cWLink
bx/blx r 01000:111 - RHn Rpc RegB Lsl Sh0 Mov F F T C

As hinted in an earlier problem sheet, the 32-bit bl instruction can, in simple cases, be executed in two halves that we shall call bl1 and bl2.

Bl1 format.png
Bl2 format.png

The first half starts with bits 11110, and the second half starts with 11111, provided we assume bits J1 and J2 are both 1, as traditionally they were: only very long branches will make them anything else. We will implement the bl1 instruction by adding the offset, scaled appropriately, to the PC and putting the result in LR; then we implement bl2 by adding the second half of the offset to the LR, and putting the result in the PC, simultaneously setting LR to the return address.

Instruction opcode cRegSelA cRegSelB cRegSelC cRand2 cShiftOp cShiftAmt cAluSel cMemRd cMemWr cRegWrite cWLink
bl1 11110 Rpc - Rlr SImm11 Lsl Sh12 Add F F T N
bl2 11111 Rlr - Rpc Imm11 Lsl Sh1 Add F F T Y

Stage 9: Conditional execution[edit]

There are several features of the Thumb instruction set that we're not going to implement, but one remains that is essential to writing working programs, and that is the mechanism for conditional branches: arithmetic instructions set the status bits NZCV, and there is a form of branch instruction that contains one of 14 conditions defined as logical combinations of the status bits.

Bcond format.png

Our approach to implementing conditional branches will be to use the ALU to compute the target address of the branch from the PC value and the displacement, but to make the writing of the result into the PC conditional on the test being passed. Three new datapath elements and two new control signals will be needed. There is a small, 4-bit register to hold the NZCV flags, and a control signal that determines whether the flags are updated by each instruction. A separate, combinational circuit takes the register contents and the four-bit condition field shown in the instruction format, and computes a signal enable that indicates whether the condition is satisfied. (As usual, this circuit functions in every instruction, whether it is a conditional branch or not, and produces a nonsense output except when a conditional branch is in progress.) The decision whether to write the result of an instruction back to a register becomes a dynamic one: in place of the signal cRegWrite appearing in the decoding table, there is a signal cWReg that takes values Y, N and C, with C denoting that the cRegWrite signal is taken from cEnabled.

Stage 9: Conditional execution

We can add the new signals to the table for existing instructions: cWFlags indicates whether the instruction should write the flags or not: T for adds and lsls and all the other arithmetic and logical instructions, F for loads and stores and branches. The values T and F for cRegWrite are replaced by values Y and N for cWReg, with C used only in the following rule for conditional branch instructions.

Instruction opcode cRegSelA cRegSelB cRegSelC cRand2 cShiftOp cShiftAmt cAluSel cMemRd cMemWr cWFlags cWReg cWLink
b<c> 11100
11101
Rpc - Rpc SImm8 Lsl Sh1 Add F F F C N
cmp ri 00101 Rt - - Imm8 Lsl Sh0 Sub F F T N N
subs ri 00111 Rt - Rt Imm8 Lsl Sh0 Sub F F T Y N

For comparison, the rules for subs ri and cmp ri are also given here: the only difference between them is that after the subtraction is performed, the subs instruction writes a register as well as updating the flags, and the cmp instruction just updates the flags. It's important to note that conditional branches read but don't destroy the flags: that makes it possible to have one compare instruction followed by several branches conditional on its result:

    cmp r1, #0
    beq zero
    bgt positive
negative:
    ...

Context

None of the detail here really matters, except as an illustration of the challenges faced by a datapath designer. The hardware, once designed, is fixed, and must be designed so that, with appropriate settings of the control signals, every machine language instruction can be implemented. For a single-cycle design like this one, where each instruction takes exactly one clock cycle, the instruction decoder essentially expands each instruction into a long string of control bits, reversing a kind of compression that makes some useful operations expressible in the limited number of instruction bits, and others not expressible at all.

Summary of control signals[edit]

We can distinguish between decoded signals, which are determined by the opcode as looked up in the ROM, and therefore the same for all instances of an instruction, derived signals, which are the same for every execution of a particular instruction, and dynamic signals, which will differ from one execution of an instruction to the next.

Decoded Derived Dynamic Description
pc Program counter value
instr The 16-bit instruction
cRegSelA, cRegSelB, cRegSelC Three rules for selecting registers
cRegA, cRegB, cRegC The three register numbers
ra, rb, rc The contents of the three registers
nextpc Address of the next instruction
cRand2 Rule for selecting shifter input
shiftin Input to the shifter
cShiftOp Shift operation
cShiftAmt Rule for determining shift amount
shiftamt Amount of shift
aluin2, shcarry Outputs from the shifter
cAluSel Rule for determining ALU operations
cAluOp The ALU operation
aluout, newflags Outputs from the ALU
cMemRd, cMemWr Whether to read or write the memory
memout Result of memory read
result Result for write-back
cWFlags Whether to update the flags
cCond Condition to test
enable Whether the condition is satisfied
cWReg Rule for writing result register
regwrite Whether result will be written
cWLink Rule for updating link register
cLink Whether link register will updated

Questions[edit]

What's the difference between decoded, derived and dynamic signals?

Decoded signals are those that appear in the decoding tables and are determined by the instruction's opcode; derived signals also depend on other parts of the instruction halfword, and dynamic signals depend on other parts of the machine state.

  • The decoded signals will be the same in every instance of an instruction – that it to say, two instructions that share the same opcode will have the same decoded signals. Note, however, that two instructions like mov r and mov i8 may share the same mnemonic in assembly language, but have different opcodes, and so be treated as different instructions by the hardware, with different decoded signals. In the simulator, these decoded signals form the members of the |Control| structure |ctrl| contained in one of the decoding tables, and have names like ctrl->cShiftAmt.
  • The derived signals will be the same whenever a particular instruction is executed, because they are determined by the bits of the instruction taken all together; for example, the instruction add r3, r1, r2 always writes its result to register r3, and so has cRegC = 3. In the simulator, these derived signals are outside the |Control| structure, but also have names like |cRegC| that start with a lower-case |c| followed by an upper-case letter.
  • The dynamic signals differ from one execution of the instruction to another, so that at one time that add instruction could write |7| to r3, and another time it could write |8|, and the value of the |result| signal would be different in the two cases.

What do the notations Rd, Rn, etc., mean when referring to ARM instructions?

What they mean in the ARM documentation is not entirely clear, because the bit-fields that are labelled with these notations vary from one instruction to another. Perhaps they refer to the roles played by the registers in the instruction, with Rd being the destination, Rn the first operand, Rm the second, and so on.

I have used these names (perhaps misguidedly) to refer to fixed fields of the instruction half-word:

  • Rd is bits [2:0].
  • Rn is bits [5:3].
  • Rm is bits [8:6].
  • Rt is bits [10:8].

These names are used in the Thumb simulator for values of type RegSel, and there is a function regsel that interprets them, modelling the behaviour of each of the three muxes that determine which registers a particular instruction reads and writes. When I revise the course, I will change these names to prevent possible confusion with ARM documentation.

In the lectures, you talk about processor designs containing lots of multiplexers, but some books talk about designs containing various busses. What's the difference?

There isn't really a difference: a bus in a conventional design is a bundle of wires that can be driven by several sources at different times, usually by setting one source to produce an output and the others to enter a high-impedance state. If there is a difference, it is that we are neutral about the technology used to implement a multiplexer – logic gates, or the kind of three-state logic just described – whereas a bus, especially if it stretches between chips, has a specific implementation that must be followed.

An alternative instruction encoding for the ARM in which each instruction is encoded in 16 rather than 32 bits. The advantage is compact code, the disadvantage that only a selection of instructions can be encoded, and only the first 8 registers are easily accessible. In Cortex-M microcontrollers, the Thumb encoding is the only one provided.

A register that contains the address of the next instruction to be executed. Because of pipelining, on ARM Cortex-M machines, reading the program counter yields a value that is 4 bytes greater than the address of the current instruction.

(Read-Only Memory). A form of storage whose contents are non-volatile (are not lost when the power is off) but cannot be changed under program control. Modern ROM is usually EEPROM – Electrically Erasable Programmable Read Only Memory, and can be changed electrically, and even under control of a program running on the microcontroller, but using special peripheral registers and not the normal store instructions. Flash memory is a modern, super-compact implementation of EEPROM, but for our purposes it does exactly the same job. We will modify the contents of the micro:bit's flash memory by downloading programs, but we will probably not be writing programs that change the contents of the flash memory.

Four bits, N, Z, V and C, in the processor status word that indicate the result of a comparison or other arithmetic operation. Briefly, N indicates whether the result of the operation was negative, Z indicates whether it was zero, C is the value of the carry-out bit from the ALU, and V indicates whether the operation overflowed, yielding a result that was different in sign from what could be predicted from the inputs to the operation. A comparison is treated like a subtraction as far as setting the condition codes is concerned. After the condition codes have been set, a subsequent conditional branch instruction can test them, and make a branch decision based on a boolean combination of their values. All ten arithmetic comparisons (equal, not-equal, and less-than, less-than-or-equal, greater-than, and greater-than-or-equal for both signed and unsigned representations) can be represented in this way. When a process is interrupted, the condition codes must be saved and restored as part of the processor state, in case the interrupt came between a comparison and a subsequent conditional branch.

In instructions that access memory, one of several rules for computing the address of the location to be accessed. For example, one addressing mode might obtain the address by adding the contents of two registers, and another might add a register and a small constant. CISC machines are characterised by more varied and more complex addressing modes than RISC machines.

An addressing mode that involves adding a small constant to the value of the program counter in order to form an address. On the ARM, PC-relative addressing is used to access tables of large constants that are located at the end of the code for each procedure, giving access to the values of these constants without having to embed large constants in instructions.

An addressing mode that involves adding a fixed offset to the value of the program counter in order to form an address. On ARM, a large constant that does not fit in the immediate field of an instruction can be loaded into a register using a PC-relative load instruction. The assembler generated such instructions, and automatically lays out a table of literal values, when a programmer uses the syntax ldr rn, =const.

A register sp that holds the address of the most recent occupied word of the subroutine stack. On ARM, as on most recent processors, the subroutine stack grows downwards, so that the sp holds the lowest address of any occupied work on the stack.

On ARM processors, a register (r14) in which the program counter value is saved by the instructions bl and blx that call a subroutine. The subroutine can return by branching to this address with the instruction bx lr, or can save the value on the stack (with push {..., lr}) and later return by restoring the same value back into the program counter (with pop {..., pc}).

A register that holds the address of the next instruction to be executed.

A symbolic representation of the machine code for a program.

A convention where multiple outputs can share a single wire; those outputs that are not currently driving the wire high or low must be put into a third, high impedance state so as not to interfere with others.