Thunder tutorial

Copyright © 2024 J. M. Spivey
Jump to navigation Jump to search

Thunder[1] is based on a register machine with a RISC-like instruction set where arithmetic instructions specify two source registers and a destination, or a source register, an immediate operand and a destination. Loads and stores are separate from arithmetic, and support a few simple addressing modes.

Note: the code contained in this tutorial will work on x86, ARM and amd64. For amd64, there are complications due to the fact that the virtual machine can access only the bottom 2GB of the address space, so it's best to stick with x86 for now.

Another page has a comprehensive list of Thunder instructions.

Touring fact.c

The code shown here is part of the example program thunder/fact.c included with Thunder. Build instructions are given below.

First, we need to get a piece of C gobbledegook out of the way:

typedef int (*funcp)(int);

This defines funcp to be the type of pointers to functions that take an int parameter and return an int result.[2]

Then we can define compile to be a function with no parameters that returns this type: it does so by using Thunder to create a factorial function and return it.

funcp compile(void) {
     void *entry;

We'll use void * for the type of addresses in the generated code. In this case entry is the entry point for the factorial function.

     vmlabel lab1 = vm_newlab(), lab2 = vm_newlab();
     vmreg r0 = vm_ireg[0], r1 = vm_ireg[1];

Now we create two label objects for use in the code. We will define these labels at appropriate points below, and can use them as targets for branch instructions either before or after defining them. Everything is fixed up behind the scenes so that the branches lead to the right place.

We also name two registers for use in the code. There are two classes of registers in Thunder: integer and floating point, and the integer registers are divided into the callee-save registers vm_ireg[0..vm_nvreg) and the caller-save registers vm_ireg[vm_nvreg..vm_nireg). Thus, vm_nvreg is the number of callee-save registers, guaranteed to be preserved across each function call, and vm_nireg is the total number of integer registers, including any caller-save registers that are not preserved across function calls. The floating point registers vm_freg[0..vm_nfreg) are all caller-save.

This scheme is convenient, because it's safe to use callee-save registers as if they were caller-save, but not vice versa. Thunder guarantees that all callee-save registers that are used in a function body are saved on entry to the function and restored on exit, either by saving them all regardless, or by monitoring register use in the function body and saving registers as needed.

All existing implementations of Thunder provide at least 2 callee-save registers, at least 7 integer registers altogether, and at least 6 floating-point registers. Smart applications (like the OBC JIT) have their own scheme for managing registers, and can use additional registers if they exist. Simpler applications may use a fixed number of registers for fixed purposes, and can safely ignore any registers they don't need. It's wise to include an assertion like assert(vm_nvreg >= 2 && vm_nireg >= 4) in the main program to document the assumptions made by a program, and protect against surprises if future implementations provide fewer registers than expected.

The program continues.

     entry = vm_begin("fact", 1);
     vm_gen(GETARG, r0, 0);
     vm_gen(MOV, r1, 1);

Here we begin compiling a Thunder procedure: its name is fact, and it has one (integer) argument. Thunder supports up to three incoming arguments, all of integer or pointer type, and three outgoing arguments for function calls that appear in the body of a function. We save the entry address of the function as the value of entry. The name fact doesn't greatly matter, except that there is a debugging feature that lets you save the code for the procedure in a file named fact.dump.

The second line is the first instruction in the body of the procedure, generated by calling the interface function vm_gen that takes a variable number of arguments. The instruction GETARG r0, 0 fetches the first argument (numbered 0) and puts in register r0. This argument could, according to the host architecture, be passed on the stack or in a register – we don't know – but the GETARG instruction is implemented in the appropriate way. Incoming arguments are guaranteed to remain available until the first instruction in the function body that is not a GETARG instruction, but not after that. Most applications fetch the arguments immediately on entry and save them in registers or memory. The third line adds another instruction, setting register r1 to 1.[3]

Now begins the main loop of the factorial function, which repeatedly decrements r0 towards zero, and keeps a running product in r1. What you see here is a sequence of instructions, each adding one line to the assembly-language program we are putting together.

     vm_label(lab1);                   // Label lab1
     vm_gen(BEQ, r0, 0, lab2);         // Branch to lab2 if r0 = 0
     vm_gen(MUL, r1, r1, r0);          // Set r1 = r1 * r0
     vm_gen(SUB, r0, r0, 1);           // Set r0 = r0 = 1
     vm_gen(JUMP, lab1);               // Jump back to lab1
     vm_label(lab2);                   // Label lab2

The first and last lines define the two labels we created earlier; these labels are used in the BEQ and JUMP instructions that make the loop. I've added comments above to describe the effect of each instruction.

Notice that we don't concern ourselves here with the nature of the host machine: it may or may not have an instruction that compares a register with a constant and branches if they are equal, and if it doesn't, then it's the job of the Thunder implementation on that machine to make the same effect using the instructions that do exist, perhaps a compare instruction followed by a branch on condition codes.

After the loop, we need to return the value in register r1. We do this by moving it to a special register vm_ret, and making this the last action in the function.

     vm_gen(MOV, vm_ret, r1);

The register vm_ret may or may not be the same as one of the other caller-save registers provided for our use, so moving a value to vm_ret may destroy the value of a random register. For this reason, the move to vm_ret should come just before the function returns.

The compile function ends as follows:

     vm_end();
     return entry;
}

First, the call to vm_end() completes and seals the function we have just put together. There may be interaction with the operating system to make sure the memory it occupies is executable, and to flush the memory caches so that the code appears ready for execution. The funtion compile returns the entry point of the newly compiled function. In the main program, we can write:

int main(int argc, char *argv[]) {
     funcp fact = compile();
     printf("The factorial of 10 is %d\n", fact(10));
     return 0;
}

As you can see, the object that is returned by compile can be called just as if it were a function written in C.

A recursive function

The program fact.c contains a second definition of the factorial function, this time using recursion. This illustrates the fact that recursion is possible for generated functions, but at a more mundane level also shows the code needed for a function call. The function compile2 begins as follows.

funcp compile2(void) {
     void *entry;
     vmlabel lab1 = vm_newlab(), lab2 = vm_newlab();
     vmreg r0 = vm_ireg[0], r1 = vm_ireg[1];
     assert(vm_nvreg >= 2);

     entry = vm_begin("fact2", 1);
     vm_gen(GETARG, r0, 0);

As before, we declare a couple of labels and a couple of registers; this time we assume the registers to be callee-save, hence the assertion. Now we find the body of the factorial function. First we deal with the case n = 0 by returning the constant 1.

     vm_gen(BNE, r0, 0, lab1);
     vm_gen(MOV, vm_ret, 1);
     vm_gen(JUMP, lab2);

If the base case does not apply, then we must make a recursive call. To implement a function call, we first compute into registers any arguments that are not constants; in this case, we comptute n-1 in r1.

     vm_label(lab1);
     vm_gen(SUB, r1, r0, 1);

Now we are ready for the call. First comes a PREP instruction, specifying the number of outgoing arguments; then a sequence of ARG instructions specifying the arguments from right to left. In this case, we write vm_gen(ARG, r1) to pass a value from a register, but there is also a form of the ARG instruction that passes an integer constant. Finally, there is a CALL instruction to specify the procedure to call; this time, we are calling a fixed function, namely the one we are in the process of defining, but there is also an instruction to call a function whose address is in a register.

     vm_gen(PREP, 1);
     vm_gen(ARG, r1);
     vm_gen(CALL, entry);

A number of safety precautions are needed in order to be sure that the function call will work on all architectures. First, only ARG instructions appear between the PREP and the CALL; that means computing any non-constant arguments in advance. Second, we must be finished with GETARG instructions for the incoming arguments of a function before the PREP, because some architectures put incoming and outgoing arguments in the same registers.

The result of the call appears in the special register vm_ret, which may be the same as one of the other caller-save registers. We should save it elsewhere before using any of these registers again, and before making another call. In our case, we can safely read the result of the recursive call and write our own result in the same multiply instruction, then immediately return.

     vm_gen(MUL, ret, r0, ret);

The compiling function ends in a similar way as before.

     vm_label(lab2);
     vm_end();
     return (funcp) entry;
}

Both these examples have compiling functions that are straight-line sequences of procedure calls, so they always compile the same sequence of instructions. But the possibility is immediately present of using more elaborate compiling functions, including ones that translate a sequence of instructions in some other code, such as Keiko, or compiling functions that are defined by recursion over an abstract syntax tree.

(This example doesn't access data in memory, so doesn't use the load and store instructions that Thunder provides. There's another example in the same file that uses these things, and also floating point.)

Other features

  • Local storage by using vm_begin_locals(name, nargs, locsize), then addressing the local storage at positive offsets from the special regiser vm_base. This is not used by the Keiko JIT, so may not be implemented on some targets.

Operand sizes

Machines commonly restrict the size of values that may appear as immediate fields of instructions. This isn't a problem on x86, where 32-bit immediate values are supported, but it is a problem on the ARM, where immediates larger than 256 are not always allowed, according to a complicated scheme that makes 256 and 260 representable but 257 not. Thunder imposes no such restrictions, and you can freely write

vm_gen(ADD, r0, r0, 257);

If the architecture requires it, then Thunder will insert code that loads the constant into a (hidden) register before the ADD operation. It's also possible to load from arbitrary addresses, not just from ones that will fit in the offset field of an instruction. So you can write

vm_gen(LDW, r0, (int) &var);

if var is any (statically-allocated) variable. On x86, this generates a mov instruction containing a 32-bit address; on ARM, we get an instruction that loads the address into a spare register then an indirect load through that register; luckily, there is an ARM convention that register IP is reserved for this kind of manoeuvre. It works out either way, without us having to worry about it at the Thunder level. On amd64, things are a bit more complicated, and we have to make sure that var has an address that fits in 32 bits.

Debugging

By setting appropriate debugging flags, it's possible to see the Thunder instructions together with the machine instructions for the host that they generate. Here is a trace of the iterative factorial function being assembled on the x86. Thunder instructions have upper-case mnemonics, and the corresponding x86 instructions are in lower case and indented a bit more. Flouting both the Intel and the AT&T conventions, these x86 instructions are shown in a style where the destination is on the left, memory operands are written offset(reg) and immediate operands are written #imm. This is debugging output, not input, so you will never have to write in this form.

You will see that the routine begins with four registers being pushed on the stack so as to preserve the values they had in the caller; they are popped again as part of the translation of the RET instruction at the end. Registers V0 and V1 correspond to BX and SI on the x86, notated as rBX and rSI here to avoid the confusion about EBX vs BX or BL in Intel assembler.

The GETARG operation fetches the argument from the stack by addressing relative to rSP and puts it in a register. Each of the other instructions has a straightforward translation into x86 machine code, most as a single instruction. In the case of the MUL instruction, this is possible because one of the input registers is also used for the output, but the Thunder translator would have inserted a move instruction if needed. The conditional branch BEQ is translated into a test instruction followed by a branch on condition codes, and the SUB instruction is able to use the special form dec rBX that assembles to a single byte.

---   push rBP [55]
---   push rBX [53]
---   push rSI [56]
---   push rDI [57]
--- GETARG V0, 0
---   mov rBX, 20(rSP) [8b 5c 24 14]
--- MOV V1, 1
---   mov rSI, #1 [be 01 00 00 00]
--- L1:
--- BEQ V0, 0, L2
---   test rBX, rBX [85 db]
---   je L2 [0f 84 00 00 00 00]
--- MUL V1, V1, V0
---   imul rSI, rBX [0f af f3]
--- SUB V0, V0, 1
---   dec rBX [4b]
--- JUMP L1
---   jmp L1 [e9 ef ff ff ff]
--- L2:
--- MOV I0, V1
---   mov rAX, rSI [8b c6]
---   pop rDI [5f]
---   pop rSI [5e]
---   pop rBX [5b]
---   pop rBP [5d]
---   ret [c3]

Looking more closely, you will see the numeric machine code for each instruction shown in square brackets. This information is mainly of interest to those who write the translator, piecing together the output bit by painful bit, as in the end the binary output is all that matters. There are a couple of points worth noting, however.

In the je instruction, there is a two-byte opcode 0f 84, followed by a four-byte offset that is shown as zero, because at the time the instruction is assembled, the location of label L2 is still unknown. The proper offset will be patched in later when L2 does become known. In contrast, as a backward jump, the jmp instruction shows the offset to reach L1 as -17, or ef ff ff ff in little-endian order, and that's correct because there are precisely 17 bytes of object code between L1 and L2.

In difficult cases, where the irregularity of the x86 architecture gets in the way, it is sometimes necessary for the object code to do a rain dance, where some registers are pushed on the stack before the operation, used for their magical properties, then popped again afterwards. This particularly affects (a) shift instructions with a variable shift amount, where the x86 insists on having the shift amount in the rCX register (and calling it CL for the purpose); and (b) single-byte stores, where the source register must be one of rAX, rCX, rDX or rBX (aka AL, CL, DL, BL) and not rSI, rDI or rBP. It's all taken care of in the translation, and it's best not to worry about it. The story changes in amd64 mode, where the gods of AMD or Intel have revealed special spells to make rain without the need for a dance.

Build instructions

To build Thunder and the example program for the host:

$ git clone https://github.com/Spivoxity/thunder.git
$ cd thunder
$ autoreconf
$ ./configure --disable-m64x32 --enable-debug
$ make fact 
$ ./fact

The --disable-m64x32 flag causes the program to be built as a 32-bit application, avoiding all the complexities of mixing 32-bit and 64-bit addressing. It will uses gcc -m32, and requires approriate development tools and libaries to be installed.

To build for the ARM and run on an emulator:

$ sudo apt-get install gcc-arm-linux-gnueabihf qemu-user
$ cd thunder/armtest
$ make fact
$ ./run ./fact



  1. Why is it called Thunder? Because it's intended as a successor to GNU Lightning! https://www.youtube.com/watch?v=bFjL3DZ9V0I
  2. You can avoid the typedef by writing int (*compile(void))(int) { ... } and return (int (*)(int)) entry instead of what follows. But only Ken Thompson will ever go out with you if you do, and he's dead now.
  3. The function vm_gen is actually a portmanteau for a family of functions with names like vm_gen2ri, meaning "generate an instruction with 2 operands, one a Register and the other an Integer", so that the call vm_get(MOV, r1, 1) actually means vm_gen2ri(MOV, r1, 1). A fancy overloading mechanism identifies which member of the family is needed for each call, making the code more readable and less fiddly to type. I mention this fact only because, like all overloading mechanisms I've seen, it reacts to even trivial mistakes by spitting out incomprehensible error messages. The mechanism in this case involves varargs macros and the _Generic feature introduced in C11.