Thumb simulator (Digital Systems)

Copyright © 2024 J. M. Spivey
Jump to navigation Jump to search

This literate Haskell script is a register-level simulator for a partial implementation of the Thumb instruction set.

  • A diagram of the datapath with control signals named.

The main things missing are multi-word load and store instructions (including push and pop), loads and stores for bytes and half-words, and the whole exception mechanism. The implementation executes each instruction in a single cycle, with no pipelining, and is capable of loading and executing binary machine code prepared with the standard assembler.

module Thumb where
import Data.Word
import Data.Int
import Data.Bits(testBit, shiftL, shiftR, rotateR,
    complement, (.&.), (.|.), xor, Bits)
import qualified Data.Map.Strict as Map
import qualified Data.ByteString.Lazy as ByteString
import qualified System.Environment

Components

We begin with a selection of architectural components, each simulated by a single function from state and inputs to new state and outputs. We will exploit laziness to connect these components together, and have the calculations run in an order consistent with the dependencies between components, without having to specify an explicit order of evaluation.

Registers

The state of a register is a single value.

newtype Register a = Register a
instance Show a => Show (Register a) where show (Register x) = show x

The register has a write-enable input we, as well as an input for the next value to be stored, and an output for the value currently stored.

register :: Register a -> Bool -> a -> (a, Register a)
register (Register x) we y = (x, Register (if we then y else x))

In point of fact, the part of the Thumb architecture we shall model contains just one register that is not part of the addressable register file, namely the one containing the NZCV flags.

Register file

The register file used by the machine is special in a number of ways. It is capable of reading three arbitrary registers on each cycle, plus the program counter (this is needed for the store instruction str r1, [r2, r3]; and capable of writing one arbitrary register as well as the link register lr = r14 and the program counter pc = r15, both with the value nextpc. Additionally, reading the pc as an ordinary register yields pc+4 and writing it discards the least significant bit, in agreement with the conventions of the architecture.

In fact, will never write both lr and pc from nextpc, because in a branch-and-link instruction, the pc is always written with the target address, and as the code below spells out, an explicit write takes precedence.

The state of the register file is a list of 16 register values, but in debugging output we show only the first four.

newtype RegFile = RegFile [Word32]
instance Show RegFile where show (RegFile regs) = show (take 4 regs)

The function regfile takes five control signals: the numbers of three registers to read, and two Booleans, one, cRegWrite, indicating whether to write an arbitrary register, and another, cLink, indicating whether to write lr with the nextpc value. In addition the the current state, there are two data inputs: the data to be written to an arbitrary register if cRegWrite is true, and the nextpc value to be written to the pc and optionally the lr register.

regfile :: (Int, Int, Int, Bool, Bool) -> RegFile -> Word32 -> Word32
    -> ((Word32, Word32, Word32, Word32), RegFile)
regfile
    (cRegA, cRegB, cRegC, cRegWrite, cLink)
    (RegFile regs) wdata nextpc =
  ((read cRegA, read cRegB, read cRegC, pc), RegFile regs')
  where
    read r = if r == 15 then pc + 4 else regs !! r
    pc = regs !! 15
    regs' = map update [0..15]
    update r =
      if cRegWrite && r == cRegC then
        (if r == 15 then wdata .&. complement 1 else wdata)
      else if r == 15 || (r == 14 && cLink) then nextpc
      else regs !! r

It would be nice to replace the list of 16 words with an array that supports constant-time indexing, but it isn't clear whether that would give any real speedup.

Memories

For simplicity, we adopt a 'modified Harvard architecture', where the single memory of the machine is presented via two interfaces that can be thought of as independent caches. This simulation does not include cache misses, so in effect we have a two-port memory, capable of fetching an instruction word and either reading or writing a data word in each cycle. Real implementations of the instruction set are not like this, having either a single cache or no cache at all, and inserting an extra cycle for load and store operations.

The memory state is represented by a finite mapping from addresses to contents, so as to get reasonably efficient incremental updates in an applicative way.

newtype Memory = Memory (Map.Map Word32 Word32)

The instruction memory is read one 16-bit halfword at a time, so we divide the address by 4 and use bit 1 to select one half or the other of the 32-bit word with that index. Bit 0 of the PC is ignored.

imem :: Memory -> Word32 -> Word32
imem (Memory mem) addr =
  (if bit 1 addr then shiftR v 16 else v) .&. 0xffff
  where v = Map.findWithDefault 0 (shiftR addr 2) mem

The data memory supports both reading and writing, and for the sake of realism, we read the memory (risking a cache miss or segfault in real life) only if the instruction requires it. Thus there are two control signals cMemRd and cMemWr, and two data inputs, the address and the value to be written. The only output is the value read. Because the memory, supports only word-sized loads and stores, the bottom two bits of the address is ignored.

dmem (cMemRd, cMemWr) (Memory mem) addr wdata = (rdata, Memory mem')
  where
    rdata = if cMemRd then Map.findWithDefault 0 (shiftR addr 2) mem else 0
    mem' = if cMemWr then Map.insert (shiftR addr 2) wdata mem else mem

Shifter

The barrel shifter supports logical and arithmetic shifts and also rotations. We'll use an enumerated type for the operations to avoid specifying a particular encoding.

data ShiftOp = Lsl | Lsr | Asr | Ror

The shifter takes an operation, a value to be shifted, and a shift amount (encoded as an Int here). It returns the shifted value, and also the last bit shifted out, useful because shift instructions put this bit in the C flag. That bit will be zero if the shift amount is zero.

shifter :: ShiftOp -> Word32 -> Int -> (Word32, Bool)
shifter op x n = 
  case op of
    Lsl -> (shiftL x n, bit (32-n) x)
    Lsr -> (shiftR x n, bit (n-1) x)
    Asr -> (word (shiftR (int32 x) n), bit (n-1) x)
    Ror -> (rotateR x n, bit (n-1) x)

Arithmetic Logic Unit

The ALU supports many operations

data AluOp = Add | Sub | And | Eor | Adc | Sbc | Neg | Orr
  | Mul | Mov | Mvn | Bic | Adr | Sg7 | Sg9

For convenience, I've added two extra pseudo-operations Sg7 and Sg9, here. Both are equivalent to either Add or Sub, depending on either bit 7 or bit 9 of the instruction word instr. Here's the part of the control unit that interprets them.

alusel :: AluOp -> Word32 -> AluOp
alusel sAluOp instr =
  case sAluOp of
    Sg7 -> if bit 7 instr then Sub else Add
    Sg9 -> if bit 9 instr then Sub else Add
    _ -> sAluOp

The ALU has a control input for the operation, and four data inputs. Two inputs are the two arguments to the operation, but also supplied are the C flag, used by the Adc and Sbc operations, and the carry bit shcbit computed by the shifter: for some operations, this becomes the carry bit for the whole operation. The outputs are the result of the operation, together with the four flag bits. The N and Z flags always have the same meaning, and the C and V flags have meanings dependent on the operation.

type Flags = (Bool, Bool, Bool, Bool)
alu :: AluOp -> Word32 -> Word32 -> Bool -> Bool -> (Word32, Flags)
alu cAluOp in1 in2 cin shcbit =
  (r, (n, z, v, c)) 
  where
    (r, c, v) =
      case cAluOp of
        Add -> adder in1 in2 False
        Sub -> adder in1 (complement in2) True
        And -> (in1 .&. in2, False, False)
        Eor -> (in1 `xor` in2, False, False)
        Adc -> adder in1 in2 cin
        Sbc -> adder in1 (complement in2) cin
        Neg -> adder 0 (complement in2) True
        Orr -> (in1 .|. in2, False, False)
        Mul -> (in1 * in2, False, False)
        Mov -> (in2, False, shcbit)
        Mvn -> (complement in2, False, shcbit)
        Bic -> (in1 .&. complement in2, False, False)
        Adr -> ((in1 + in2) .&. complement 0x3, False, False)
    n = signbit r
    z = (r == 0)

Many of the operations (Add, Sub, Adc, Sbc, Neg) are implemented using an adder that itself provides the V and C bits. Others (And, Eor, Orr, Bic) are implemented by means of bitwise Boolean operations, and give V and C bits that are zero. The two forms of move (Mov, Mvn) ignore the argument in1 and copy in2 to the output, possibly after negating it bitwise; they provide a carry bit from the shifter.

The last operation, Adr, is implicit in the add pc and ldr pc instructions that use pc-relative addressing. These are defined to round down the pc value (which is aligned to a 2 byte boundary) to make it a multiple of 4. No doubt this operation could be implemented with the same adder, but it is written directly here. The status bits don't matter, because neither instruction sets the flags.

There is an infelicity in this implementation, in that any instruction that writes some of the flag bits writes all of them, and shifts write the C flag even if the shift amount is zero.

Addition would be easily implemented as a 32-bit adder with carry-in and carry-out, and the usual analysis of signs to determine the V bit. Like most high level languages, Haskell makes it difficult to get hold of the carry-out bit of an addition. We could do the addition in 64 bits instead of 32, and throw away all but 33 bits of the result. Alternatively we can reconstruct the carry-out from the sign bits of the inputs and result. Note that this calculation is different from the one that computes the overflow bit.

adder :: Word32 -> Word32 -> Bool -> (Word32, Bool, Bool)
adder a b cin = (r, cout, vout) where
  r = a + b + ord cin
  cout = (ord (signbit a) + ord (signbit b) - ord (signbit r) > 0)
  vout = (signbit a == signbit b && signbit r /= signbit a)

Condition codes

The Thumb instruction set provides 14 different conditions for branching on the flags, and we need a straightforward combinational circuit to establish their meanings. The numbers from 0 to 13 appear in a field of conditional branch instructions, so we need to be explicit about the encoding here.

condition :: Int -> Flags -> Bool
condition cCond (n, z, v, c) =
  case cCond of
    0  -> z               -- eq
    1  -> not z           -- ne
    2  -> c               -- cs
    3  -> not c           -- cc
    4  -> n               -- mi
    5  -> not n           -- pl
    6  -> v               -- vs
    7  -> not v           -- vc
    8  -> c && not z      -- hi
    9  -> not c || z      -- ls
    10 -> n == v          -- ge
    11 -> n /= v          -- lt
    12 -> not z && n == v -- gt
    13 -> z || n /= v     -- le

Instruction decoding

The first step in executing a fetched instruction is to decode it, producing a bundle of control signals.

decode :: Word32 ->
  (RegSel, RegSel, RegSel, Rand2Sel, ShiftOp, ShiftSel, AluOp,
    Bool, Bool, Bool, Perhaps, Perhaps, String)
data Perhaps = Y | N | C

Decoding an instruction gives a list of 12 control signals and a name that is used for debugging. If

decode instr =
  (sRegA, sRegB, sRegC, cRand2, cShiftOp, cShiftAmt, sAluOp,
    cFlags, cMemRd, cMemWr, sLink, sRegWrite, mnem)

then

  • sRegA, sRegB and sRegC are register selectors that determine how to select the three registers that are read or written by the instruction.
  • cRand2 determines where the second ALU operand comes from.
  • cShiftOp and cShiftAmt determine how that operand is treated by the barrel shifter.
  • sAluOp determines what ALU operation is performed.
  • cFlags determines whether the flags are updated.
  • cMemRd and cMemWr determine whether a memory read or write happens.
  • sLink determines whether the address of the following instruction is written into lr.
  • sRegWrite determines whether the result of the instruction is written to a register.
  • mnem represents the mnemonic for the instruction.

The sLink and sRegWrite fields have type Perhaps, with possible values Y (meaning yes), N (meaning no), and C, meaning that the answer will be determined by some other condition. In the case of sLink, this is another bit in the instruction that is not taken into account by the decoder, but makes the difference between bx and blx. In the case of sRegWrite, conditional branches work by computing the target address regardless of whether the branch is taken or not, but writing it into the pc only if the condition is satisfied.

As the definition of decode reveals, the majority of instructions can be decoded (using decode1) by looking at the five bits [15:11], unless those bits are 01000, in which case we consult other decoding tables, one (decode2) for ALU operations starting 010000, and another (decode3) for high register operations together with bx/blx, starting with 010001. We don't implement byte and halfword loads and stores, not push, pop and adjacent operations, so this is enough.

decode instr =
  if op /= 8 then decode1 op
  else if not (bit 10 instr) then decode2 (field (9,6) instr)
  else decode3 (field (9,8) instr)
  where op = field (15,11) instr

The functions decode1, decode2 and decode3 are defined by tables that could become ROMs in a hardware implementation. As a prelude to spelling out the details, let's define the enumerated types that are used in different columns. The type RegSel describes the different instruction bits that can be used to name a register. Some instructions contain register names explicitly, and other refer implicitly to the sp, lr or pc. This part of the design is rendered more complicated by the large variety of different instruction format that appear Thumb code.

data RegSel = Rd | Rn | Rm | Rt | RHn | RHd | R13 | R14 | R15
regsel s instr =
  case s of
    Rd  -> int (field (2,0) instr)
    Rn  -> int (field (5,3) instr)
    Rm  -> int (field (8,6) instr)
    Rt  -> int (field (10,8) instr)
    RHn -> int (field (6,3) instr)
    RHd -> int (8 * field(7,7) instr + field(2,0) instr)
    R13 -> 13; R14 -> 14; R15 -> 15

Similarly, different instructions have different ways of specifying the amount that their second operand should be shifted. Some use an implicit constant – 0, 1, 2, or 12 – while others have a five-bit immediate field, and others take the shift amount from the first register ra read by the instruction.

data ShiftSel = S0 | S1 | S2 | S12 | ShI | ShR
shiftsel cShiftAmt ra instr =
  case cShiftAmt of
    S0 -> 0; S1 -> 1; S2 -> 2; S12 -> 12
    ShI ->  int (field (10,6) instr)
    ShR -> int (ra .&. 0x1f)

We must deal with a bit more complexity in determining the second operand fed to the ALU. Sometimes this is the second register rb read by the instruction, but it can also be drawn from signed or unsigned immediate fields in the instruction of various sizes and locations.

data Rand2Sel = RegB | RIm3 | Imm5 | Imm7 | Imm8 | SI8 | Im11 | SI11
rand2sel cRand2 rb instr =
  case cRand2 of
    RegB -> rb
    RIm3 -> if bit 10 instr then field (8,6) instr else rb
    Imm5 -> field (10,6) instr
    Imm7 -> field (6,0) instr
    Imm8 -> field (7,0) instr
    SI8  -> signext 8 (field (7,0) instr)
    Im11 -> field (10,0) instr
    SI11 -> signext 11 (field (10,0) instr)

With these conventions in place, we are ready to list the control signals for each instruction. First, for most instructions we examine bits [15:11].

decode1 op =
  case op of
    --    sRegA                 cShiftOp      cFlags   sLink
    --     |   sRegB      cRand2 |   cShiftAmt | cMemRd | sRegWrite
    --     |    |   sRegC  |     |    | sAluOp |  | cMemWr |  mnem     
    --     |    |    |     |     |    |    |   |  |  |  |  |   |
    0  -> (Rd,  Rn,  Rd,  RegB, Lsl, ShI, Mov, t, f, f, N, Y, "lsls")
    1  -> (Rd,  Rn,  Rd,  RegB, Lsr, ShI, Mov, t, f, f, N, Y, "lsrs")
    2  -> (Rd,  Rn,  Rd,  RegB, Asr, ShI, Mov, t, f, f, N, Y, "asrs")
    3  -> (Rn,  Rm,  Rd,  RIm3, Lsl, S0,  Sg9, t, f, f, N, Y, "adds/subs")
    4  -> (Rt,  Rd,  Rt,  Imm8, Lsl, S0,  Mov, t, f, f, N, Y, "movs i8")
    5  -> (Rt,  Rd,  Rt,  Imm8, Lsl, S0,  Sub, t, f, f, N, N, "cmp i8")
    6  -> (Rt,  Rd,  Rt,  Imm8, Lsl, S0,  Add, t, f, f, N, Y, "adds i8")
    7  -> (Rt,  Rd,  Rt,  Imm8, Lsl, S0,  Sub, t, f, f, N, Y, "subs i8")
    -- 8: see below
    9  -> (R15, Rd,  Rt,  Imm8, Lsl, S2,  Adr, f, t, f, N, Y, "ldr pc")
    10 -> (Rn,  Rm,  Rd,  RegB, Lsl, S0,  Add, f, f, t, N, N, "str r")
    11 -> (Rn,  Rm,  Rd,  RegB, Lsl, S0,  Add, f, t, f, N, Y, "ldr r")
    12 -> (Rn,  Rd,  Rd,  Imm5, Lsl, S2,  Add, f, f, t, N, N, "str i5")
    13 -> (Rn,  Rd,  Rd,  Imm5, Lsl, S2,  Add, f, t, f, N, Y, "ldr i5")
    -- 13, 14-17: only full word loads and stores
    18 -> (R13, Rd,  Rt,  Imm8, Lsl, S2,  Add, f, f, t, N, N, "str sp")
    19 -> (R13, Rd,  Rt,  Imm8, Lsl, S2,  Add, f, t, f, N, Y, "ldr sp")
    20 -> (R15, Rd,  Rt,  Imm8, Lsl, S2,  Adr, f, f, f, N, Y, "add pc")
    21 -> (R13, Rd,  Rt,  Imm8, Lsl, S2,  Add, f, f, f, N, Y, "add sp")
    22 -> (R13, Rd,  R13, Imm7, Lsl, S2,  Sg7, f, f, f, N, Y, "add/sub sp")
    -- 23, 24: push, pop, ldm, stm and miscellania not implemented
    26 -> (R15, Rd,  R15, SI8,  Lsl, S1,  Add, f, f, f, N, C, "bcond")
    27 -> (R15, Rd,  R15, SI8,  Lsl, S1,  Add, f, f, f, N, C, "bcond")
    28 -> (R15, Rd,  R15, SI11, Lsl, S1,  Add, f, f, f, N, Y, "b")
    30 -> (R15, Rd,  R14, SI11, Lsl, S12, Add, f, f, f, N, Y, "bl1")
    31 -> (R14, Rd,  R15, Im11, Lsl, S1,  Add, f, f, f, Y, Y, "bl2")

Note that a cmp instruction is identical with a subs instruction, except that it doesn't write the result back into a register. As indicated earlier, a conditional branch instruction bcond uses the ALU to compute the branch target, and writes it into the pc only if the condition is true. The long bl instruction is treated as two halves, bl1 and bl2.

Instructions where bits [15:10] are 010000 are ALU operations between registers. We need to look at bits [9:6] to discover what operation is needed.

decode2 op =
  case op of
    --    sRegA                 cShiftOp      cFlags   sLink
    --     |   sRegB      cRand2 |   cShiftAmt | cMemRd | sRegWrite
    --     |    |   sRegC  |     |    | sAluOp |  | cMemWr |  mnem     
    --     |    |    |     |     |    |    |   |  |  |  |  |   |
    0  -> (Rd,  Rn,  Rd,  RegB, Lsl, S0,  And, t, f, f, N, Y, "ands")
    1  -> (Rd,  Rn,  Rd,  RegB, Lsl, S0,  Eor, t, f, f, N, Y, "eors")
    2  -> (Rn,  Rd,  Rd,  RegB, Lsl, ShR, Mov, t, f, f, N, Y, "lsls")
    3  -> (Rn,  Rd,  Rd,  RegB, Lsr, ShR, Mov, t, f, f, N, Y, "lsrs")
    4  -> (Rn,  Rd,  Rd,  RegB, Asr, ShR, Mov, t, f, f, N, Y, "asrs")
    5  -> (Rd,  Rn,  Rd,  RegB, Lsl, S0,  Adc, t, f, f, N, Y, "adcs")
    6  -> (Rd,  Rn,  Rd,  RegB, Lsl, S0,  Sbc, t, f, f, N, Y, "sbcs")
    7  -> (Rn,  Rd,  Rd,  RegB, Ror, ShR, Mov, t, f, f, N, Y, "rors")
    8  -> (Rd,  Rn,  Rd,  RegB, Lsl, S0,  And, t, f, f, N, N, "tst")
    9  -> (Rd,  Rn,  Rd,  RegB, Lsl, S0,  Neg, t, f, f, N, Y, "negs")
    10 -> (Rd,  Rn,  Rd,  RegB, Lsl, S0,  Sub, t, f, f, N, N, "cmp")
    11 -> (Rd,  Rn,  Rd,  RegB, Lsl, S0,  Add, t, f, f, N, N, "cmn")
    12 -> (Rd,  Rn,  Rd,  RegB, Lsl, S0,  Orr, t, f, f, N, Y, "orrs")
    13 -> (Rd,  Rn,  Rd,  RegB, Lsl, S0,  Mul, t, f, f, N, Y, "muls")
    14 -> (Rd,  Rn,  Rd,  RegB, Lsl, S0,  Bic, t, f, f, N, Y, "bics")
    15 -> (Rd,  Rn,  Rd,  RegB, Lsl, S0,  Mvn, t, f, f, N, Y, "mvns")

Instructions where bits [15:10] are 010001 can use the high registers, and are decoded further using bits [9:8].

decode3 op =
  case op of
    --    sRegA                 cShiftOp      cFlags   sLink
    --     |   sRegB      cRand2 |   cShiftAmt | cMemRd | sRegWrite
    --     |    |   sRegC  |     |    | sAluOp |  | cMemWr |  mnem     
    --     |    |    |     |     |    |    |   |  |  |  |  |   |
    0  -> (RHd, RHn, RHd, RegB, Lsl, S0,  Add, f, f, f, N, Y, "add hi")
    1  -> (RHd, RHn, RHd, RegB, Lsl, S0,  Sub, t, f, f, N, N, "cmp hi")
    2  -> (Rd,  RHn, RHd, RegB, Lsl, S0,  Mov, f, f, f, N, Y, "mov hi")
    3  -> (Rd,  RHn, R15, RegB, Lsl, S0,  Mov, f, f, f, C, Y, "bx/blx")

In these tables, we use t for True and f for False.

t = True
f = False

Instruction execution

The state of the entire simulation consists of the register file, containing 16 addressable registers, the Processor Status Register containing the flags, and the memory.

newtype State = State (RegFile, Register Flags, Memory)
instance Show State where
  show (State (regs, _, _)) = show (regs, getreg regs 15)

All the dramatis personae have now been introduced, and the play can begin. The heart of the action is a function exec that runs the simulation for a single clock cycle. In essence, it is a function from states to states, but to allow us to peek inside the simulation, we make it also return the name of the instruction that was executed, and a list of debugging values, each labelled with a string. Different aspects of the simulation can be inspected by editing this list, but for now it contains the two operands entering the shifter and ALU, and the result that is computed and potentially written to a register.

exec :: State -> (State, String, [(String, Word32)])
exec (State (regs, psr, mem)) = (State (regs', psr', mem'), mnem, ddt)
  where
    ddt = [("ra", ra), ("rand2", rand2), ("result", result)]

The rest of this section continues the where clause begun here, listing all the elements of the processor, and following roughly the order of data flow. First we see the instruction being fetched and decoded, using the current pc value.

    -- Fetch and decode
    instr = imem mem pc
    (sRegA, sRegB, sRegC, cRand2, cShiftOp, cShiftAmt, sAluOp,
      cFlags, cMemRd, cMemWr, sLink, sRegWrite, mnem) = decode instr

Various control signals are derived by looking at other fields of the instruction, under the direction of signals coming out of the ROM.

    -- Derived control signals
    cRegA = regsel sRegA instr
    cRegB = regsel sRegB instr
    cRegC = regsel sRegC instr
    cAluOp = alusel sAluOp instr
    cCond = int (field (11,8) instr)
    cLink = case sLink of N -> False; Y -> True; C -> bit 7 instr

A note on naming: control signals with names like cRegA, starting with c for control, directly affect the datapath – in this case, naming the first of the three registers that are read. That signal can come from one of several places: it could be a fixed register, or could be derived from some field of the instruction. There is a fixed list of possible rules for this, and a multiplexer regsel to determine which rule is used. That multiplexer is controlled in turn by a signal sRegA that starts with s for 'select', which is directly stored in the control ROM.

Next, the three registers are read, and depending on signals to be computed later, an arbitrary register can be written, and the lr and pc can be written also with the address of the next instruction.

    -- Register file
    ((ra, rb, rc, pc), regs') =
      regfile (cRegA, cRegB, cRegC, cRegWrite, cLink) regs result nextpc
    nextpc = pc+2

The input to the barrel shifter is either the second register rb that was read, or an immediate field from the instruction. We use the shifter to multiply such fields by 2 or 4 when they from part of an address. The shift amount, if not implicit, is also taken either from an immediate field, or from the register ra.

    -- Shifter
    shiftin = rand2sel cRand2 rb instr
    shiftamt = shiftsel cShiftAmt ra instr
    (rand2, shcbit) = shifter cShiftOp shiftin shiftamt

Register value ra is the first input to the ALU, and the output from the shifter forms the second input.

    -- ALU and flags
    (aluOut, flags') = alu cAluOp ra rand2 (cbit flags) shcbit
    (flags, psr') = register psr cFlags flags'

For load and store instructions, the ALU has computed the address, and the third resister rc provides the data to be stored. A load instruction takes its result from the memory; otherwise it is the ALU result.

    -- Memory
    (memdata, mem') = dmem (cMemRd, cMemWr) mem aluOut rc
    result = if cMemRd then memdata else aluOut

Conditional branches are taken depending on the condition field cCond and the flag values before the instruction. The value of the condition helps to determine whether the result is written back to the register file.

    -- Conditions and write-back
    cBranch = condition cCond flags
    cRegWrite = case sRegWrite of N -> False; Y -> True; C -> cBranch

This completes the outline of the datapath. It's worth noting that some of the equations mention quantities (such as pc and result) that are computed only by later equations. Thanks to the lazy evaluation of Haskell, this works perfectly, provided that no cycle of dependency is created.

Useful bits and pieces

The function cbit selects the Carry bit from a tuple of flags.

cbit :: Flags -> Bool
cbit (n, z, v, c) = c

The sign bit of a 32-bit word is bit 31.

signbit :: Word32 -> Bool
signbit x = bit 31 x

The function bit extracts a single bit of a word as a Boolean. Just to be definite, we count bits outside the range [0..31] as zero: among other things, that affects the way the C flag is set by a shift instruction with a shift amount of zero.

bit :: Int -> Word32 -> Bool
bit n x = if 0 <= n && n < 32 then testBit x n else False

We can extract a field of a word with an appropriate combination of shifts and masks.

field :: (Int, Int) -> Word32 -> Word32
field (j,i) x =
  shiftR (x .&. mask) i
  where mask = complement (complement 1 `shiftL` j)

To sign extend a value from n bits to 32, we can use the well-known trick of shifting left and then (arithmetically) right again.

signext :: Int -> Word32 -> Word32
signext n x =
  word (shiftR (shiftL (int32 x) (32-n)) (32-n))

The state of the simulation is wrapped up as an abstract data type, but sometimes it's convenient to peek inside and extract the value of a particular register.

peekreg :: State -> Int -> Word32
peekreg (State (regfile, _, _)) n = getreg regfile n
getreg :: RegFile -> Int -> Word32
getreg (RegFile regs) n = regs !! n

Haskell's strong type system means that conversions between integer types must be made explicitly. Here is a handy set of conversion functions that improve the readability of mixed expressions.

ord :: Bool -> Word32
ord b = fromIntegral (fromEnum b)
int :: Integral a => a -> Int
int x = fromIntegral x
int32 :: Integral a => a -> Int32
int32 x = fromIntegral x
word :: Integral a => a -> Word32
word x = fromIntegral x

Main program

The simulator has one more trick up its sleeve: it can read object files prepared using the usual Gnu tools, obviating the need for any kind of built-in assembler. Actually, the trick is not so clever, because it's possible to persuade the tools to output a flat binary file, and then all that is needed is to read in that file. Inevitably, the IO monad rears its ugly head.

readBinary :: String -> IO Memory
readBinary file =
  do
    contents <- ByteString.readFile file
    return (Memory (loop Map.empty 0 contents))
  where
    loop mem i dat =
      if ByteString.null dat then mem else
      case ByteString.unpack (ByteString.take 4 dat) of
        [x, y, z, w] ->
          let v = shiftL (word w) 24 + shiftL (word z) 16
                     + shiftL (word y) 8 + word x in
          loop (Map.insert i v mem) (i+1) (ByteString.drop 4 dat)
        _ -> error "partial word"

The simulator is invoked as follows, where a.bin is a binary file prepared earlier, and 123, 456 and 789 become the initial values of registers r0, r1, r2.

thumb a.bin 123 456 789

Up to 13 register values can be specified on the command line in decimal or hexadecimal, and to these we add initial values for the stack pointer, the link register, and the program counter. The program starts at address 0, and the lr value of -1 is special in that if this value ever appears in the pc, then the simulation stops. This happens when the main program returns.

init_regs :: [String] -> RegFile
init_regs args =
  RegFile (map read (take 13 (args ++ repeat "0"))
      ++ [0x400, 0xffffffff, 0])
init_flags = Register (f, f, f, f)

The simulation itself is a loop that calls exec repeatedly, showing the state before and after each instruction, and printing the name and debugging information for each instruction executed. The loop terminates when the magic value we supplied as the initial value of lr makes it into the pc.

run st =
 let pc = peekreg st 15 in
 do putStrLn (show st);
     if pc == 0xffffffff then putStrLn "Brexit!" else
       let (st', mnem, ddt) = exec st in
       do putStrLn (mnem ++ " " ++ show ddt); run st'

And now the main program just reads the arguments, loads the binary file, and starts the simulation.

main =
  do
    args <- System.Environment.getArgs
    mem <- readBinary (head args)
    run (State (init_regs (tail args), init_flags, mem))