Thunder tutorial

From Spivey's Corner
Jump to: navigation, search

Thunder is based on a register machine with a RISC-like instruction set where arithmetic instructions specify two source registers and a destination, or a source register, an immediate operand and a destination. Loads and stores are separate from arithmetic, and support a single addressing mode register + constant.

Note: the code contained in this tutorial will work on x86 and ARM. For AMD64, some adjustments are needed to cope with 64 bit pointers and 32 bit offests, and the nature of those adjustments is still evolving. It's best to stick with x86 for now.

Another page has a comprehensive list of Thunder instructions.

Touring fact.c

The code shown here is part of the example program thunder/fact.c included with OBC. Build instructions are given below.

First, we need to get a piece of C gobbledegook out of the way:

typedef int (*funcp)(int);

This defines funcp to be the type of pointers to functions that take an int parameter and return an int result.[1]

Then we can define compile to be a function with no parameters that returns this type: it does so by using Thunder to create a factorial function and return it.

funcp compile2(void) {
     code_addr entry;

We'll use code_addr for the type of addresses in the generated code. In this case entry is the entry point for the factorial function.

     vmlabel lab1 = vm_newlab(), lab2 = vm_newlab();
     vmreg r0 = ireg[0], r1 = ireg[1];

Now we create two label objects for use in the code. We will define these labels at appropriate points below, and can use them as targets for branch instructions either before or after defining them. Everything is fixed up behind the scenes so that the branches lead to the right place.

We also name two registers for use in the code. There are two classes of registers in Thunder: integer and floating point, and the integer registers are divided into the callee-save registers ireg[0..nvreg) and the caller-save registers ireg[nvreg..nireg). Thus, nvreg is the number of callee-save registers, guaranteed to be preserved across each function call, and nireg is the total number of integer registers, including any caller-save registers that are not preserved across function calls. The floating point registers freg[0..nfreg) are all caller-save.

This scheme is convenient, because it's safe to use callee-save registers as if they were caller-save, but not vice versa. Thunder guarantees that all callee-save registers that are used in a function body are saved on entry to the function and restored on exit, either by saving them all regardless, or by monitoring register use in the function body and saving registers as needed.

All existing implementations of Thunder provide at least 2 callee-save registers, at least 7 integer registers altogether, and at least 6 floating-point registers. Smart applications (like the OBC JIT) have their own scheme for managing registers, and can use additional registers if they exist. Simpler applications may use a fixed number of registers for fixed purposes, and can safely ignore any registers they don't need. It's wise to include an assertion like assert(nvreg >= 2 && nireg >= 4) in the main program to document the assumptions made by a program, and protect against surprises if future implementations provide fewer registers than expected.

The program continues.

     entry = vm_begin("fact", 1);
     vm_gen(GETARG, r0, 0);
     vm_gen(MOV, r1, 1);

Here we begin compiling a Thunder procedure: its name is fact, and it has one (integer) argument. Thunder supports up to three incoming arguments, all of integer or pointer type, and three outgoing arguments for function calls that appear in the body of a function. We save the entry address of the function as the value of entry. The name fact doesn't greatly matter, except that there is a debugging feature that lets you save the code for the procedure in a file named fact.dump.

The second line is the first instruction in the body of the procedure, generated by calling the interface function vm_gen2ri, indicating an instruction with two operands, the first a register and the second an integer. The instruction GETARG r0, 0 fetches the first argument (numbered 0) and puts in register r0. This argument could, according to the host architecture, be passed on the stack or in a register – we don't know – but the GETARG instruction is implemented in the appropriate way. Incoming arguments are guaranteed to remain available until the first instruction in the function body that is not a GETARG instruction, but not after that. Most applications fetch the arguments immediately on entry and save them in registers or memory.

The third line adds another instruction, setting register r1 to 1.

Now begins the main loop of the factorial function, which repeatedly decrements r0 towards zero, and keeps a running product in r1. What you see here is a sequence of instructions, each adding one line to the assembly-language program we are putting together.

     vm_label(lab1);                   // Label lab1
     vm_gen(BEQ, r0, 0, lab2);     // Branch to lab2 is r0 = 0
     vm_gen(MUL, r1, r1, r0);      // Set r1 = r1 * r0
     vm_gen(SUB, r0, r0, 1);       // Set r0 = r0 = 1
     vm_gen(JUMP, lab1);             // Jump back to lab1
     vm_label(lab2);                   // Label lab2

The first and last lines define the two labels we created earlier; these labels are used in the BEQ and JUMP instructions that make the loop. I've added comments above to describe the effect of each instruction.

Each instruction has its own pattern of operands, so the interface routine vm_gen uses 'advanced' C features to select one of a family of functions with different numbers of arguments. In the case of the BEQ instructuon, a function vm_gen3rij is called behind the scenes: it handles instructions that specify a register, an integer, and a label as operands. You should never have to call these routines directly, but you may see them mentioned in error messages.

Notice that we don't concern ourselves here with the nature of the host machine: it may or may not have an instruction that compares a register with a constant and branches if they are equal, and if it doesn't, then it's the job of the Thunder implementation on that machine to make the same effect using the instructions that do exist.

After the loop, we need to return the value in register r1. We do this by moving it to a special register ret, and then issuing a RET instruction.

     vm_gen(MOV, ret, r1);

The register ret may or may not be the same as one of the other caller-save registers provided for our use, so moving a value to ret may destroy the value of a random register. For this reason, the move to ret should come just before the function returns.

The compile function ends as follows:

     return (funcp) entry;

First, the call to vm_end() completes and seals the function we have just put together. There may be interaction with the operating system to make sure the memory it occupies is executable, and to flush the memory caches so that the code appears ready for execution. Second, we take the entry point for the function and cast it to the type funcp so that it becomes a function pointer. This is the point at which magic happens, and what was data becomes executable code. It's appropriate that the cast itself doesn't do anything, but just changes the C compiler's idea of the type.

In the main program, we can write:

int main(int argc, char *argv[]) {
     funcp fact = compile();
     printf("The factorial of 10 is %d\n", fact(n));
     return 0;

As you can see, the object that is returned by compile can be called just as if it were a function written in C.

A recursive function

The program fact.c contains a second definition of the factorial function, this time using recursion. This illustrates the fact that recursion is possible for generated functions, but at a more mundane level also shows the code needed for a function call. The function compile2 begins as follows.

funcp compile2(void) {
     code_addr entry;

     vmlabel lab1 = vm_newlab();
     vmreg r0 = ireg[0], r1 = ireg[1];

     assert(nvreg >= 2);

     entry = vm_begin("fact2", 1);
     vm_gen(GETARG, r0, 0);

As before, we declare a couple of labels and a couple of registers; this time we assume the registers to be callee-save, hence the assertion.

Now we find the body of the factorial function. First we deal with the case n = 0 by returning the constant 1.

     vm_gen(BNEQ, r0, 0, lab1);
     vm_gen(MOV, ret, 1);

If the base case does not apply, then we must make a recursive call. To implement a function call, we first compute into registers any arguments that are not constants; in this case, we comptute n-1 in r1.

     vm_gen(SUB, r1, r0, 1);

Now we are ready for the call. First comes a PREP instruction, specifying the number of outgoing arguments; then a sequence of ARG instructions specifying the arguments from right to left. In this case, we write vm_gen1r(ARG, r1) to pass a value from a register, but there is also vm_gen1i(const) to pass a constant. Finally, there is a CALL instruction to specify the procedure to call; this time, we are calling a fixed function, namely the one we are in the process of defining, but there is also an instruction to call a function whose address is in a register.

     vm_gen(PREP, 1);
     vm_gen(ARG, r1);
     vm_gen(CALL, (int) entry);

A number of safety precautions are needed in order to be sure that the function call will work on all architectures. First, only ARG instructions appear between the PREP and the CALL; that means computing any non-constant arguments in advance. Second, we must be finished with GETARG instructions before the PREP, because some architectures put incoming and outgoing arguments in the same registers.

The result of the call appears in the special register ret, which may be the same as one of the other caller-save registers. We should save it elsewhere before using any of these registers again, and before making another call. In our case, we can safely read the result of the recursive call and write our own result in the same multiply instruction, then immediately return.

     vm_gen(MUL, ret, r0, ret);

The compiling function ends in the same way as before.

     return (funcp) entry;

Both these examples have compiling functions that are straight-line sequences of procedure calls, so they always compile the same sequence of instructions. But the possibility is immediately present of using more elaborate compiling functions, including ones that translate a sequence of instructions on some other code, such as Keiko, or comiling functions that are defined by recursion over an abstract syntax tree.

(This example doesn't access data in memory, so doesn't use the load and store instructions that Thunder provides. There's another example in the same file that uses these things, and also floating point.)

Other features

  • Local storage by using vm_begin_locals(name, nargs, locsize).

Operand sizes

Machines commonly restrict the size of values that may appear as immediate fields of instructions. This isn't a problem on x86, where 32-bit immediate values are supported, but it is a problem on the ARM, where immediates larger than 256 are not always allowed, according to a complicated scheme that makes 256 and 260 representable but 257 not. Thunder imposes no such restrictions, and you can freely write

vm_gen(ADD, r0, r0, 257);

If the architecture requires it, then Thunder will insert code that loads the constant into a register before the ADD operation. It's also possible to load from arbitrary addresses, not just from ones that will fit in the offset field of an instruction. So you can write

vm_gen(LDW, r0, (int) &var);

if var is any (statically-allocated) variable. On x86, this generates a mov instruction containing a 32-bit address; on ARM, we get an instruction that loads the address into a spare register then an indirect load through that register; luckily, there is an ARM convention that register IP is reserved for this kind of manoeuvre. It works out either way, without us having to worry about it at the Thunder level.


By setting appropriate debugging flags, it's possible to see the Thunder instructions together with the machine instructions for the host that they generate. Here is a trace of the iterative factorial function being assembled on the x86. Thunder instructions have upper-case mnemonics, and the corresponding x86 instructions are in lower case and indented a bit more. Flouting both the Intel and the AT&T conventions, these x86 instructions are shown in a style where the destination is on the left, memory operands are written offset(reg) and immediate operands are written #imm. This is debugging output, not input, so you will never have to write in this form.

You will see that the routine begins with four registers being pushed on the stack so as to preserve the values they had in the caller; they are popped again as part of the translation of the ret instruction at the end. Registers V0 and V1 correspond to BX and SI on the x86, notated as rBX and rSI here to avoid the confusion about EBX vs BX or BL in Intel assembler.

The getarg operation fetches the argument from the stack by addressing relative to rSP and puts it in a register. Each of the other instructions has a straightforward translation into x86 machine code, most as a single instruction. In the case of the MUL instruction, this is possible because one of the input registers is also used for the output, but the Thunder translator would have inserted a move instruction if needed. The conditional branch BEQ is translated into a test instruction followed by a branch on condition codes, and the SUB instruction is able to use the special form dec rBX that assembles to a single byte.

---   push rBP [55]
---   push rBX [53]
---   push rSI [56]
---   push rDI [57]
--- GETARG V0, 0
---   mov rBX, 20(rSP) [8b 5c 24 14]
--- MOV V1, 1
---   mov rSI, #1 [be 01 00 00 00]
--- L1:
--- BEQ V0, 0, L2
---   test rBX, rBX [85 db]
---   je L2 [0f 84 00 00 00 00]
--- MUL V1, V1, V0
---   imul rSI, rBX [0f af f3]
--- SUB V0, V0, 1
---   dec rBX [4b]
--- JUMP L1
---   jmp L1 [e9 ef ff ff ff]
--- L2:
--- MOV I0, V1
---   mov rAX, rSI [8b c6]
--- RET
---   pop rDI [5f]
---   pop rSI [5e]
---   pop rBX [5b]
---   pop rBP [5d]
---   ret [c3]

Looking more closely, you will see the numeric machine code for each instruction shown in square brackets. This information is mainly of interest to those who write the translator, piecing together the output bit by painful bit, as in the end the binary output is all that matters. There are a couple of points worth noting, however.

In the je instruction, there is a two-byte opcode 0f 84, followed by a four-byte offset that is shown as zero, because at the time the instruction is assembled, the location of label L2 is still unknown. The proper offset will be patched in later when L2 does become known. In contrast, as a backward jump, the jmp instruction shows the offset to reach L1 as -17, or ef ff ff ff in little-endian order, and that's correct because there are precisely 17 bytes of object code between L1 and L2.

In difficult cases, where the irregularity of the x86 architecture gets in the way, it is sometimes necessary for the object code to do a rain dance, where some registers are pushed on the stack before the operation, used for their magical properties, then popped again afterwards. This particularly affects (a) shift instructions with a variable shift amount, where the x86 insists on having the shift amount in the rCX register (and calling it CL for the purpose); and (b) single-byte stores, where the source register must be one of rAX, rCX, rDX or rBX (aka AL, CL, DL, BL) and not rSI, rDI or rBP. It's all taken care of in the translation, and it's best not to worry about it. The story changes in AMD64 mode, where we are given special spells to make rain without the need for a dance.

Build instructions

There's a Makefile.

  1. You can avoid the typedef by writing int (*compile(void))(int) { ... } and return (int (*)(int)) entry instead of what follows. But only Ken Thompson will ever go out with you if you do, and he's dead now.
Personal tools