# Thumb simulator (Digital Systems)

This literate Haskell script is a register-level simulator for a partial implementation of the Thumb instruction set.

- A diagram of the datapath with control signals named.

The main things missing are multi-word load and store
instructions (including `push`

and `pop`

),
loads and stores for bytes and half-words, and the whole
exception mechanism. The implementation executes each instruction in
a single cycle, with no pipelining, and is capable of loading and
executing binary machine code prepared with the standard assembler.

module Thumb where import Data.Word import Data.Int import Data.Bits(testBit, shiftL, shiftR, rotateR, complement, (.&.), (.|.), xor, Bits) import qualified Data.Map.Strict as Map import qualified Data.ByteString.Lazy as ByteString import qualified System.Environment

## Components

We begin with a selection of architectural components, each simulated by a single function from state and inputs to new state and outputs. We will exploit laziness to connect these components together, and have the calculations run in an order consistent with the dependencies between components, without having to specify an explicit order of evaluation.

### Registers

The state of a register is a single value.

newtype Register a = Register a

instance Show a => Show (Register a) where show (Register x) = show x

The register has a write-enable input `we`

, as well as an input for
the next value to be stored, and an output for the value currently
stored.

register :: Register a -> Bool -> a -> (a, Register a) register (Register x) we y = (x, Register (if we then y else x))

In point of fact, the part of the Thumb architecture we shall model contains just one register that is not part of the addressable register file, namely the one containing the NZCV flags.

### Register file

The register file used by the machine is special in a number of ways.
It is capable of reading three arbitrary registers on each cycle, plus
the program counter (this is needed for the store instruction
`str r1, [r2, r3]`

; and capable of writing one arbitrary register as well as
the link register `lr = r14`

and the program counter `pc = r15`

, both
with the value `nextpc`

. Additionally, reading the `pc`

as an ordinary
register yields `pc+4`

and writing it discards the least significant bit,
in agreement with the conventions of the architecture.

In fact, will never write both `lr`

and `pc`

from `nextpc`

, because in a
branch-and-link instruction, the `pc`

is always written with the
target address, and as the code below spells out, an explicit write
takes precedence.

The state of the register file is a list of 16 register values, but in debugging output we show only the first four.

newtype RegFile = RegFile [Word32]

instance Show RegFile where show (RegFile regs) = show (take 4 regs)

The function `regfile`

takes five control signals: the numbers of
three registers to read, and two Booleans, one, `cRegWrite`

,
indicating whether to write an arbitrary register, and another, `cLink`

,
indicating whether to write `lr`

with the `nextpc`

value. In addition
the the current state, there are two data inputs: the data to be
written to an arbitrary register if `cRegWrite`

is true, and the
`nextpc`

value to be written to the `pc`

and optionally the `lr`

register.

regfile :: (Int, Int, Int, Bool, Bool) -> RegFile -> Word32 -> Word32 -> ((Word32, Word32, Word32, Word32), RegFile) regfile (cRegA, cRegB, cRegC, cRegWrite, cLink) (RegFile regs) wdata nextpc = ((read cRegA, read cRegB, read cRegC, pc), RegFile regs') where read r = if r == 15 then pc + 4 else regs !! r pc = regs !! 15 regs' = map update [0..15] update r = if cRegWrite && r == cRegC then (if r == 15 then wdata .&. complement 1 else wdata) else if r == 15 || (r == 14 && cLink) then nextpc else regs !! r

It would be nice to replace the list of 16 words with an array that supports constant-time indexing, but it isn't clear whether that would give any real speedup.

### Memories

For simplicity, we adopt a 'modified Harvard architecture', where the single memory of the machine is presented via two interfaces that can be thought of as independent caches. This simulation does not include cache misses, so in effect we have a two-port memory, capable of fetching an instruction word and either reading or writing a data word in each cycle. Real implementations of the instruction set are not like this, having either a single cache or no cache at all, and inserting an extra cycle for load and store operations.

The memory state is represented by a finite mapping from addresses to contents, so as to get reasonably efficient incremental updates in an applicative way.

newtype Memory = Memory (Map.Map Word32 Word32)

The instruction memory is read one 16-bit halfword at a time, so we divide the address by 4 and use bit 1 to select one half or the other of the 32-bit word with that index. Bit 0 of the PC is ignored.

imem :: Memory -> Word32 -> Word32 imem (Memory mem) addr = (if bit 1 addr then shiftR v 16 else v) .&. 0xffff where v = Map.findWithDefault 0 (shiftR addr 2) mem

The data memory supports both reading and writing, and for the sake of
realism, we read the memory (risking a cache miss or segfault in real
life) only if the instruction requires it. Thus there are two control
signals `cMemRd`

and `cMemWr`

, and two data inputs, the address and
the value to be written. The only output is the value read.
Because the memory, supports only word-sized loads and stores, the
bottom two bits of the address is ignored.

dmem (cMemRd, cMemWr) (Memory mem) addr wdata = (rdata, Memory mem') where rdata = if cMemRd then Map.findWithDefault 0 (shiftR addr 2) mem else 0 mem' = if cMemWr then Map.insert (shiftR addr 2) wdata mem else mem

### Shifter

The barrel shifter supports logical and arithmetic shifts and also rotations. We'll use an enumerated type for the operations to avoid specifying a particular encoding.

data ShiftOp = Lsl | Lsr | Asr | Ror

The shifter takes an operation, a value to be shifted, and a shift
amount (encoded as an `Int`

here). It returns the shifted value, and
also the last bit shifted out, useful because shift instructions put
this bit in the C flag. That bit will be zero if the shift amount is
zero.

shifter :: ShiftOp -> Word32 -> Int -> (Word32, Bool) shifter op x n = case op of Lsl -> (shiftL x n, bit (32-n) x) Lsr -> (shiftR x n, bit (n-1) x) Asr -> (word (shiftR (int32 x) n), bit (n-1) x) Ror -> (rotateR x n, bit (n-1) x)

### Arithmetic Logic Unit

The ALU supports many operations

data AluOp = Add | Sub | And | Eor | Adc | Sbc | Neg | Orr | Mul | Mov | Mvn | Bic | Adr | Sg7 | Sg9

For convenience, I've added two extra pseudo-operations `Sg7`

and
`Sg9`

, here. Both are equivalent to either `Add`

or `Sub`

, depending
on either bit 7 or bit 9 of the instruction word `instr`

.
Here's the part of the control unit that interprets them.

alusel :: AluOp -> Word32 -> AluOp alusel sAluOp instr = case sAluOp of Sg7 -> if bit 7 instr then Sub else Add Sg9 -> if bit 9 instr then Sub else Add _ -> sAluOp

The ALU has a control input for the operation, and four data inputs.
Two inputs are the two arguments to the operation, but also supplied
are the C flag, used by the `Adc`

and `Sbc`

operations, and the carry
bit `shcbit`

computed by the shifter: for some operations, this
becomes the carry bit for the whole operation. The outputs are the
result of the operation, together with the four flag bits.
The N and Z flags always have the same meaning, and the C and V flags
have meanings dependent on the operation.

type Flags = (Bool, Bool, Bool, Bool)

alu :: AluOp -> Word32 -> Word32 -> Bool -> Bool -> (Word32, Flags) alu cAluOp in1 in2 cin shcbit = (r, (n, z, v, c)) where (r, c, v) = case cAluOp of Add -> adder in1 in2 False Sub -> adder in1 (complement in2) True And -> (in1 .&. in2, False, False) Eor -> (in1 `xor` in2, False, False) Adc -> adder in1 in2 cin Sbc -> adder in1 (complement in2) cin Neg -> adder 0 (complement in2) True Orr -> (in1 .|. in2, False, False) Mul -> (in1 * in2, False, False) Mov -> (in2, False, shcbit) Mvn -> (complement in2, False, shcbit) Bic -> (in1 .&. complement in2, False, False) Adr -> ((in1 + in2) .&. complement 0x3, False, False) n = signbit r z = (r == 0)

Many of the operations (`Add`

, `Sub`

, `Adc`

, `Sbc`

, `Neg`

) are
implemented using an adder that itself provides the `V`

and `C`

bits.
Others (`And`

, `Eor`

, `Orr`

, `Bic`

) are implemented by means of
bitwise Boolean operations, and give V and C bits that are zero. The
two forms of move (`Mov`

, `Mvn`

) ignore the argument `in1`

and copy `in2`

to
the output, possibly after negating it bitwise; they provide a carry
bit from the shifter.

The last operation, `Adr`

, is implicit in the `add pc`

and `ldr pc`

instructions that use `pc`

-relative addressing. These are defined to
round down the `pc`

value (which is aligned to a 2 byte boundary) to
make it a multiple of 4. No doubt this operation could be implemented
with the same adder, but it is written directly here. The status bits
don't matter, because neither instruction sets the flags.

There is an infelicity in this implementation, in that any instruction that writes some of the flag bits writes all of them, and shifts write the C flag even if the shift amount is zero.

Addition would be easily implemented as a 32-bit adder with carry-in and carry-out, and the usual analysis of signs to determine the V bit. Like most high level languages, Haskell makes it difficult to get hold of the carry-out bit of an addition. We could do the addition in 64 bits instead of 32, and throw away all but 33 bits of the result. Alternatively we can reconstruct the carry-out from the sign bits of the inputs and result. Note that this calculation is different from the one that computes the overflow bit.

adder :: Word32 -> Word32 -> Bool -> (Word32, Bool, Bool) adder a b cin = (r, cout, vout) where r = a + b + ord cin cout = (ord (signbit a) + ord (signbit b) - ord (signbit r) > 0) vout = (signbit a == signbit b && signbit r /= signbit a)

### Condition codes

The Thumb instruction set provides 14 different conditions for branching on the flags, and we need a straightforward combinational circuit to establish their meanings. The numbers from 0 to 13 appear in a field of conditional branch instructions, so we need to be explicit about the encoding here.

condition :: Int -> Flags -> Bool condition cCond (n, z, v, c) = case cCond of 0 -> z -- eq 1 -> not z -- ne 2 -> c -- cs 3 -> not c -- cc 4 -> n -- mi 5 -> not n -- pl 6 -> v -- vs 7 -> not v -- vc 8 -> c && not z -- hi 9 -> not c || z -- ls 10 -> n == v -- ge 11 -> n /= v -- lt 12 -> not z && n == v -- gt 13 -> z || n /= v -- le

## Instruction decoding

The first step in executing a fetched instruction is to decode it, producing a bundle of control signals.

decode :: Word32 -> (RegSel, RegSel, RegSel, Rand2Sel, ShiftOp, ShiftSel, AluOp, Bool, Bool, Bool, Perhaps, Perhaps, String)

data Perhaps = Y | N | C

Decoding an instruction gives a list of 12 control signals and a name that is used for debugging. If

decode instr = (sRegA, sRegB, sRegC, cRand2, cShiftOp, cShiftAmt, sAluOp, cFlags, cMemRd, cMemWr, sLink, sRegWrite, mnem)

then

`sRegA`

,`sRegB`

and`sRegC`

are*register selectors*that determine how to select the three registers that are read or written by the instruction.`cRand2`

determines where the second ALU operand comes from.`cShiftOp`

and`cShiftAmt`

determine how that operand is treated by the barrel shifter.`sAluOp`

determines what ALU operation is performed.`cFlags`

determines whether the flags are updated.`cMemRd`

and`cMemWr`

determine whether a memory read or write happens.`sLink`

determines whether the address of the following instruction is written into`lr`

.`sRegWrite`

determines whether the result of the instruction is written to a register.`mnem`

represents the mnemonic for the instruction.

The `sLink`

and `sRegWrite`

fields have type `Perhaps`

, with possible
values `Y`

(meaning yes), `N`

(meaning no), and `C`

, meaning that the
answer will be determined by some other condition. In the case of
`sLink`

, this is another bit in the instruction that is not taken into
account by the decoder, but makes the difference between `bx`

and
`blx`

. In the case of `sRegWrite`

, conditional branches work by
computing the target address regardless of whether the branch is taken
or not, but writing it into the `pc`

only if the condition is
satisfied.

As the definition of `decode`

reveals, the majority of instructions
can be decoded (using `decode1`

) by looking at the five bits [15:11],
unless those bits are 01000, in which case we consult other decoding
tables, one (`decode2`

) for ALU operations starting 010000, and
another (`decode3`

) for high register operations together with
`bx/blx`

, starting with 010001. We don't implement byte and halfword
loads and stores, not `push`

, `pop`

and adjacent operations, so this
is enough.

decode instr = if op /= 8 then decode1 op else if not (bit 10 instr) then decode2 (field (9,6) instr) else decode3 (field (9,8) instr) where op = field (15,11) instr

The functions `decode1`

, `decode2`

and `decode3`

are defined by tables
that could become ROMs in a hardware implementation. As a prelude to
spelling out the details, let's define the enumerated types that are
used in different columns. The type `RegSel`

describes the different
instruction bits that can be used to name a register. Some
instructions contain register names explicitly, and other refer
implicitly to the `sp`

, `lr`

or `pc`

. This part of the design is
rendered more complicated by the large variety of different
instruction format that appear Thumb code.

data RegSel = Rd | Rn | Rm | Rt | RHn | RHd | R13 | R14 | R15

regsel s instr = case s of Rd -> int (field (2,0) instr) Rn -> int (field (5,3) instr) Rm -> int (field (8,6) instr) Rt -> int (field (10,8) instr) RHn -> int (field (6,3) instr) RHd -> int (8 * field(7,7) instr + field(2,0) instr) R13 -> 13; R14 -> 14; R15 -> 15

Similarly, different instructions have different ways of specifying
the amount that their second operand should be shifted. Some use an
implicit constant – 0, 1, 2, or 12 – while others have a five-bit
immediate field, and others take the shift amount from the first
register `ra`

read by the instruction.

data ShiftSel = S0 | S1 | S2 | S12 | ShI | ShR

shiftsel cShiftAmt ra instr = case cShiftAmt of S0 -> 0; S1 -> 1; S2 -> 2; S12 -> 12 ShI -> int (field (10,6) instr) ShR -> int (ra .&. 0x1f)

We must deal with a bit more complexity in determining the second
operand fed to the ALU. Sometimes this is the second register `rb`

read by the instruction, but it can also be drawn from signed or
unsigned immediate fields in the instruction of various sizes and
locations.

data Rand2Sel = RegB | RIm3 | Imm5 | Imm7 | Imm8 | SI8 | Im11 | SI11

rand2sel cRand2 rb instr = case cRand2 of RegB -> rb RIm3 -> if bit 10 instr then field (8,6) instr else rb Imm5 -> field (10,6) instr Imm7 -> field (6,0) instr Imm8 -> field (7,0) instr SI8 -> signext 8 (field (7,0) instr) Im11 -> field (10,0) instr SI11 -> signext 11 (field (10,0) instr)

With these conventions in place, we are ready to list the control signals for each instruction. First, for most instructions we examine bits [15:11].

decode1 op = case op of -- sRegA cShiftOp cFlags sLink -- | sRegB cRand2 | cShiftAmt | cMemRd | sRegWrite -- | | sRegC | | | sAluOp | | cMemWr | mnem -- | | | | | | | | | | | | | 0 -> (Rd, Rn, Rd, RegB, Lsl, ShI, Mov, t, f, f, N, Y, "lsls") 1 -> (Rd, Rn, Rd, RegB, Lsr, ShI, Mov, t, f, f, N, Y, "lsrs") 2 -> (Rd, Rn, Rd, RegB, Asr, ShI, Mov, t, f, f, N, Y, "asrs") 3 -> (Rn, Rm, Rd, RIm3, Lsl, S0, Sg9, t, f, f, N, Y, "adds/subs") 4 -> (Rt, Rd, Rt, Imm8, Lsl, S0, Mov, t, f, f, N, Y, "movs i8") 5 -> (Rt, Rd, Rt, Imm8, Lsl, S0, Sub, t, f, f, N, N, "cmp i8") 6 -> (Rt, Rd, Rt, Imm8, Lsl, S0, Add, t, f, f, N, Y, "adds i8") 7 -> (Rt, Rd, Rt, Imm8, Lsl, S0, Sub, t, f, f, N, Y, "subs i8") -- 8: see below 9 -> (R15, Rd, Rt, Imm8, Lsl, S2, Adr, f, t, f, N, Y, "ldr pc") 10 -> (Rn, Rm, Rd, RegB, Lsl, S0, Add, f, f, t, N, N, "str r") 11 -> (Rn, Rm, Rd, RegB, Lsl, S0, Add, f, t, f, N, Y, "ldr r") 12 -> (Rn, Rd, Rd, Imm5, Lsl, S2, Add, f, f, t, N, N, "str i5") 13 -> (Rn, Rd, Rd, Imm5, Lsl, S2, Add, f, t, f, N, Y, "ldr i5") -- 13, 14-17: only full word loads and stores 18 -> (R13, Rd, Rt, Imm8, Lsl, S2, Add, f, f, t, N, N, "str sp") 19 -> (R13, Rd, Rt, Imm8, Lsl, S2, Add, f, t, f, N, Y, "ldr sp") 20 -> (R15, Rd, Rt, Imm8, Lsl, S2, Adr, f, f, f, N, Y, "add pc") 21 -> (R13, Rd, Rt, Imm8, Lsl, S2, Add, f, f, f, N, Y, "add sp") 22 -> (R13, Rd, R13, Imm7, Lsl, S2, Sg7, f, f, f, N, Y, "add/sub sp") -- 23, 24: push, pop, ldm, stm and miscellania not implemented 26 -> (R15, Rd, R15, SI8, Lsl, S1, Add, f, f, f, N, C, "bcond") 27 -> (R15, Rd, R15, SI8, Lsl, S1, Add, f, f, f, N, C, "bcond") 28 -> (R15, Rd, R15, SI11, Lsl, S1, Add, f, f, f, N, Y, "b") 30 -> (R15, Rd, R14, SI11, Lsl, S12, Add, f, f, f, N, Y, "bl1") 31 -> (R14, Rd, R15, Im11, Lsl, S1, Add, f, f, f, Y, Y, "bl2")

Note that a `cmp`

instruction is identical with a `subs`

instruction,
except that it doesn't write the result back into a register. As
indicated earlier, a conditional branch instruction `bcond`

uses the
ALU to compute the branch target, and writes it into the `pc`

only if
the condition is true. The long `bl`

instruction is treated as two
halves, `bl1`

and `bl2`

.

Instructions where bits [15:10] are 010000 are ALU operations between registers. We need to look at bits [9:6] to discover what operation is needed.

decode2 op = case op of -- sRegA cShiftOp cFlags sLink -- | sRegB cRand2 | cShiftAmt | cMemRd | sRegWrite -- | | sRegC | | | sAluOp | | cMemWr | mnem -- | | | | | | | | | | | | | 0 -> (Rd, Rn, Rd, RegB, Lsl, S0, And, t, f, f, N, Y, "ands") 1 -> (Rd, Rn, Rd, RegB, Lsl, S0, Eor, t, f, f, N, Y, "eors") 2 -> (Rn, Rd, Rd, RegB, Lsl, ShR, Mov, t, f, f, N, Y, "lsls") 3 -> (Rn, Rd, Rd, RegB, Lsr, ShR, Mov, t, f, f, N, Y, "lsrs") 4 -> (Rn, Rd, Rd, RegB, Asr, ShR, Mov, t, f, f, N, Y, "asrs") 5 -> (Rd, Rn, Rd, RegB, Lsl, S0, Adc, t, f, f, N, Y, "adcs") 6 -> (Rd, Rn, Rd, RegB, Lsl, S0, Sbc, t, f, f, N, Y, "sbcs") 7 -> (Rn, Rd, Rd, RegB, Ror, ShR, Mov, t, f, f, N, Y, "rors") 8 -> (Rd, Rn, Rd, RegB, Lsl, S0, And, t, f, f, N, N, "tst") 9 -> (Rd, Rn, Rd, RegB, Lsl, S0, Neg, t, f, f, N, Y, "negs") 10 -> (Rd, Rn, Rd, RegB, Lsl, S0, Sub, t, f, f, N, N, "cmp") 11 -> (Rd, Rn, Rd, RegB, Lsl, S0, Add, t, f, f, N, N, "cmn") 12 -> (Rd, Rn, Rd, RegB, Lsl, S0, Orr, t, f, f, N, Y, "orrs") 13 -> (Rd, Rn, Rd, RegB, Lsl, S0, Mul, t, f, f, N, Y, "muls") 14 -> (Rd, Rn, Rd, RegB, Lsl, S0, Bic, t, f, f, N, Y, "bics") 15 -> (Rd, Rn, Rd, RegB, Lsl, S0, Mvn, t, f, f, N, Y, "mvns")

Instructions where bits [15:10] are 010001 can use the high registers, and are decoded further using bits [9:8].

decode3 op = case op of -- sRegA cShiftOp cFlags sLink -- | sRegB cRand2 | cShiftAmt | cMemRd | sRegWrite -- | | sRegC | | | sAluOp | | cMemWr | mnem -- | | | | | | | | | | | | | 0 -> (RHd, RHn, RHd, RegB, Lsl, S0, Add, f, f, f, N, Y, "add hi") 1 -> (RHd, RHn, RHd, RegB, Lsl, S0, Sub, t, f, f, N, N, "cmp hi") 2 -> (Rd, RHn, RHd, RegB, Lsl, S0, Mov, f, f, f, N, Y, "mov hi") 3 -> (Rd, RHn, R15, RegB, Lsl, S0, Mov, f, f, f, C, Y, "bx/blx")

In these tables, we use `t`

for `True`

and `f`

for `False`

.

t = True f = False

## Instruction execution

The state of the entire simulation consists of the register file, containing 16 addressable registers, the Processor Status Register containing the flags, and the memory.

newtype State = State (RegFile, Register Flags, Memory)

instance Show State where show (State (regs, _, _)) = show (regs, getreg regs 15)

All the *dramatis personae* have now been introduced, and the play
can begin. The heart of the action is a function `exec`

that runs the
simulation for a single clock cycle. In essence, it is a function
from states to states, but to allow us to peek inside the simulation,
we make it also return the name of the instruction that was executed,
and a list of debugging values, each labelled with a string.
Different aspects of the simulation can be inspected by editing this
list, but for now it contains the two operands entering the shifter
and ALU, and the result that is computed and potentially written to a
register.

exec :: State -> (State, String, [(String, Word32)]) exec (State (regs, psr, mem)) = (State (regs', psr', mem'), mnem, ddt) where ddt = [("ra", ra), ("rand2", rand2), ("result", result)]

The rest of this section continues the `where`

clause begun here,
listing all the elements of the processor, and following roughly the
order of data flow. First we see the instruction being fetched and
decoded, using the current `pc`

value.

-- Fetch and decode instr = imem mem pc (sRegA, sRegB, sRegC, cRand2, cShiftOp, cShiftAmt, sAluOp, cFlags, cMemRd, cMemWr, sLink, sRegWrite, mnem) = decode instr

Various control signals are derived by looking at other fields of the instruction, under the direction of signals coming out of the ROM.

-- Derived control signals cRegA = regsel sRegA instr cRegB = regsel sRegB instr cRegC = regsel sRegC instr cAluOp = alusel sAluOp instr cCond = int (field (11,8) instr) cLink = case sLink of N -> False; Y -> True; C -> bit 7 instr

A note on naming: control signals with names like `cRegA`

, starting
with `c`

for control, directly affect the datapath – in this case,
naming the first of the three registers that are read. That signal
can come from one of several places: it could be a fixed register, or
could be derived from some field of the instruction. There is a fixed
list of possible rules for this, and a multiplexer `regsel`

to
determine which rule is used. That multiplexer is controlled in turn
by a signal `sRegA`

that starts with `s`

for 'select', which is
directly stored in the control ROM.

Next, the three registers are read, and depending on signals to be
computed later, an arbitrary register can be written, and the `lr`

and
`pc`

can be written also with the address of the next instruction.

-- Register file ((ra, rb, rc, pc), regs') = regfile (cRegA, cRegB, cRegC, cRegWrite, cLink) regs result nextpc nextpc = pc+2

The input to the barrel shifter is either the second register `rb`

that was read, or an immediate field from the instruction. We use the
shifter to multiply such fields by 2 or 4 when they from part of an
address. The shift amount, if not implicit, is also taken either from an immediate field, or from the register `ra`

.

-- Shifter shiftin = rand2sel cRand2 rb instr shiftamt = shiftsel cShiftAmt ra instr (rand2, shcbit) = shifter cShiftOp shiftin shiftamt

Register value `ra`

is the first input to the ALU, and the output from
the shifter forms the second input.

-- ALU and flags (aluOut, flags') = alu cAluOp ra rand2 (cbit flags) shcbit (flags, psr') = register psr cFlags flags'

For load and store instructions, the ALU has computed the address, and
the third resister `rc`

provides the data to be stored. A load
instruction takes its result from the memory; otherwise it is the ALU
result.

-- Memory (memdata, mem') = dmem (cMemRd, cMemWr) mem aluOut rc result = if cMemRd then memdata else aluOut

Conditional branches are taken depending on the condition field
`cCond`

and the flag values before the instruction. The value of the
condition helps to determine whether the result is written back to the
register file.

-- Conditions and write-back cBranch = condition cCond flags cRegWrite = case sRegWrite of N -> False; Y -> True; C -> cBranch

This completes the outline of the datapath. It's worth noting that
some of the equations mention quantities (such as `pc`

and `result`

)
that are computed only by later equations. Thanks to the lazy
evaluation of Haskell, this works perfectly, provided that no cycle of
dependency is created.

## Useful bits and pieces

The function `cbit`

selects the Carry bit from a tuple of flags.

cbit :: Flags -> Bool cbit (n, z, v, c) = c

The sign bit of a 32-bit word is bit 31.

signbit :: Word32 -> Bool signbit x = bit 31 x

The function `bit`

extracts a single bit of a word as a Boolean.
Just to be definite, we count bits outside the range [0..31] as zero:
among other things, that affects the way the C flag is set by a shift
instruction with a shift amount of zero.

bit :: Int -> Word32 -> Bool bit n x = if 0 <= n && n < 32 then testBit x n else False

We can extract a field of a word with an appropriate combination of shifts and masks.

field :: (Int, Int) -> Word32 -> Word32 field (j,i) x = shiftR (x .&. mask) i where mask = complement (complement 1 `shiftL` j)

To sign extend a value from `n`

bits to 32, we can use the well-known
trick of shifting left and then (arithmetically) right again.

signext :: Int -> Word32 -> Word32 signext n x = word (shiftR (shiftL (int32 x) (32-n)) (32-n))

The state of the simulation is wrapped up as an abstract data type, but sometimes it's convenient to peek inside and extract the value of a particular register.

peekreg :: State -> Int -> Word32 peekreg (State (regfile, _, _)) n = getreg regfile n

getreg :: RegFile -> Int -> Word32 getreg (RegFile regs) n = regs !! n

Haskell's strong type system means that conversions between integer types must be made explicitly. Here is a handy set of conversion functions that improve the readability of mixed expressions.

ord :: Bool -> Word32 ord b = fromIntegral (fromEnum b)

int :: Integral a => a -> Int int x = fromIntegral x

int32 :: Integral a => a -> Int32 int32 x = fromIntegral x

word :: Integral a => a -> Word32 word x = fromIntegral x

## Main program

The simulator has one more trick up its sleeve: it can read object
files prepared using the usual Gnu tools, obviating the need for any
kind of built-in assembler. Actually, the trick is not so clever,
because it's possible to persuade the tools to output a flat binary
file, and then all that is needed is to read in that file.
Inevitably, the `IO`

monad rears its ugly head.

readBinary :: String -> IO Memory readBinary file = do contents <- ByteString.readFile file return (Memory (loop Map.empty 0 contents)) where loop mem i dat = if ByteString.null dat then mem else case ByteString.unpack (ByteString.take 4 dat) of [x, y, z, w] -> let v = shiftL (word w) 24 + shiftL (word z) 16 + shiftL (word y) 8 + word x in loop (Map.insert i v mem) (i+1) (ByteString.drop 4 dat) _ -> error "partial word"

The simulator is invoked as follows, where `a.bin`

is a binary file
prepared earlier, and 123, 456 and 789 become the initial values of
registers `r0`

, `r1`

, `r2`

.

thumb a.bin 123 456 789

Up to 13 register values can be specified on the command line in
decimal or hexadecimal, and to these we add initial values for the
stack pointer, the link register, and the program counter. The
program starts at address 0, and the `lr`

value of -1 is special in
that if this value ever appears in the `pc`

, then the simulation
stops. This happens when the main program returns.

init_regs :: [String] -> RegFile init_regs args = RegFile (map read (take 13 (args ++ repeat "0")) ++ [0x400, 0xffffffff, 0])

init_flags = Register (f, f, f, f)

The simulation itself is a loop that calls `exec`

repeatedly, showing
the state before and after each instruction, and printing the name and
debugging information for each instruction executed. The loop
terminates when the magic value we supplied as the initial value of
`lr`

makes it into the `pc`

.

run st = let pc = peekreg st 15 in do putStrLn (show st); if pc == 0xffffffff then putStrLn "Brexit!" else let (st', mnem, ddt) = exec st in do putStrLn (mnem ++ " " ++ show ddt); run st'

And now the main program just reads the arguments, loads the binary file, and starts the simulation.

main = do args <- System.Environment.getArgs mem <- readBinary (head args) run (State (init_regs (tail args), init_flags, mem))