JIT interface for OBC

From Spivey's Corner
Jump to: navigation, search

An experimental JIT translator in the Oberon system uses a portable interface to generate code. The interface is based on a simple RISC architecture, a subset of that supported by GNU Lightning. There are currently two implementations of the interface:

  • One directly generates code for the x86.
  • The other is a wrapper around GNU Lightning itself, and is potentially portable to a wider range of architectures. You'll need a recent version of Lightning from the GIT repository to build it.

Note that the architecture described below contains just the things that are needed to translate programs in my dialect of Oberon: for example, it supports unsigned characters and signed shorts, because that's what I needed. There's no reason why a better variety of data types, operations, or indeed addressing modes cannot be supported, but each addition requires the binary details for each target arcitecture to be unearthed and programmed up.

Registers

There are twelve genuine registers and two fictional ones:

  • Three callee-save integer registers xV0, xV1, xV2. These are saved and restored as part of the canned prologue and epilogue sequences in any function, so there's no need to generate explicit code to save and restore them.
  • Three caller-save integer registers xR0, xR1, xR2.
  • Six (caller-save) floating point registers xF0, xF1, ..., xF5.
  • A function that returns a result should place it in the fictitious (integer) register xRET. Since this may overlap with one of the other integer registers, the function should do this as the last action before returning. There's no support for functions that return a floating-pooint result.
  • The fictitious register xZERO can be used as an index register in load and store instructions; it always contains zero.

Each integer register holds 32 bits; a floating point register is capable of holding either a float or a double.

Though the RISC-ish interface describes instructions that operate between three registers, the target machine may have instructions that specify only a source and a destination register, so it's advantageous where possible to make the destination register of the RISC-ish instruction the same as the first operand; then the assembler won't have to insert a move instruction behind the scenes.

On the x86, each floating point register is bigger than a double, and both floats and doubles occupy one register. On other architectures where a double needs a register pair, the interface can be implemented by allocating only the even registers to the virtual registers, and bringing the adjacent odd register into use when one of the virtual registers is to hold a double.

Instructions

Each instruction takes up to three operands, and (with the usual excpetion of store instructions), they are specified in a MIPS-like order with destination register followed by source operands, either two registers or a register and an (integer) immediate.

Arithmetic operations

  • Binary integer instructions can name three registers or two registers and an immediate. The two right shift instructions RSH and RSHU are 'arithmetic' and 'logical' shifts respectively.
ADD AND MUL OR SUB XOR LSH RSH RSHU / rrr, rri
  • Unary integer operations operate between two registers.
NEG NOT / rr
  • Binary floating point instructions operate between three registers.
ADDD ADDF DIVD DIVF MULD MULF SUBD SUBF / rrr
  • Unary floating point instruction.
NEGD NEGF / rr
  • Two nullary operations load a floating-point zero into a register.
ZEROF ZEROD / r

There are no integer divide instructions, mostly because (just in my dialect?) Oberon requires mathematically correct results, and it's easiest to provide these using a subroutine.

On the x86, the float and double operations are identical, but on other architectures they will be different.

Branches

  • Integer conditional branches use two registers or a register and an immediate.
BEQ BGEQ BGT BLEQ BLT BNEQ / rrb, rib
  • Unsigned conditional branches are similar.
BGEQU BLTU / rrb, rib
  • Floating and double conditional branches use two registers.
BEQD BEQF BGEQD BGEQF BGTD BGTF BLEQD BLEQF BLTD BLTF BNEQD BNEQF / rrb
  • Unconditional branch.
JUMP / b

Comparisons

These comparison operations take two integer or two floating-point operands and deliver a result (1 or 0) in an integer register. The second operand of an integer comparison may be an immediate.

  • Integer comparions.
EQ GEQ GT LEQ LT NEQ / rrr, rri
  • Floating and double comparisons.
EQD EQF GEQD GEQF GTD GTF LEQD LEQF LTD LTF NEQD NEQF / rrr

Loads and stores

  • The move instruction can load an integer register from another register or an immediate.
MOV / rr, ri
  • Load instructions all use indexed addressing, but absolute and indirect addressing can be simulated by replacing the index register with rZERO or the offset with 0. LDC zero extends a one byte unsigned character into a 32-bit register; LDS sign-extends from a two-byte integer into a 32-bit register.
LD LDC LDD LDF LDS / rri
  • Store instructions also use indexed addressing.
ST STC STD STF STS / rri

Type conversions

All operate between two registers of the appropriate kinds.

  • Conversion between float and double.
CONVDF CONVFD / rr
  • Conversion from integer to float or double.
CONVND CONVNF / rr
  • Conversion from integer to short or char. CONVNS discards the upper 16 bits and sign extends the lower 16 bits into the whole word. CONVNC replaces all but the low-order 8 bits with zeroes.
CONVNC CONVNS / rr

There is no instruction to convert from float or double to integer; you have to call a subroutine for that.

Subroutine linkage

  • GETARG fetches incoming function arguments into (integer) registers.
  • RET returns from the current function.
  • PREP, ARG and CALL and RETVAL are used for function calls.

There's no support for functions that take floating-point arguments.

Code generation interface

Most instructions are generated by calling one of these interface routines. For example, the call gen3rri(ADD, xV0, xV1, 3) generates an ADD instruction that adds the contents of register xV1 and the constant 3, putting the result in register xV0.

void vm_gen0(operation op);
void vm_gen1r(operation op, vmreg a);
void vm_gen1i(operation op, int a);
void vm_gen2rr(operation op, vmreg a, vmreg b);
void vm_gen2ri(operation op, vmreg a, int b);
void vm_gen3rrr(operation op, vmreg a, vmreg b, vmreg c);
void vm_gen3rri(operation op, vmreg a, vmreg b, int c);

For conditional and unconditional branches, there are two ways of providing the target address: either pass it as an argument to one of the following interface routines (and ignore the result); or give NULL as the argument, save the value returned, and use it as an argument to a later call of vm_patch.

typedef unsigned char *code_addr;
code_addr vm_gen1b(operation op, code_addr lab);
code_addr vm_gen3rrb(operation op, vmreg a, vmreg b, code_addr lab);
code_addr vm_gen3rib(operation op, vmreg a, int b, code_addr lab);

The function vm_label returns the current code address; for backwards branches, this value can be used as an argument to one of the interface routines just listed. For any kind of branch, forwards or backwards, it's possible to fill in the branch target by calling vm_patch(loc, lab), where loc is the address returned by vm_gen1b etc., and lab is an address returned by vm_label.

code_addr vm_label(void);
void vm_patch(code_addr loc, code_addr lab);

To begin compiling a function, call vm_prolog(n) where n is the number of (integer) arguments. Subsequent calls to vm_arg return the 'offsets' of these arguments, and these can be used in GETARG instructions to load the arguments into registers.

void vm_prolog(int n);
int vm_arg(void);

The vm_flush routine arranges for the code between start and finish to be made executable, by fiddling with permission bits for the memory pages involved, flushing the processor's cache, or both.

void vm_flush(code_addr start, code_addr finish);

There's a routine that can be used to allocate space for a jump table.

code_addr *vm_jumptable(int n);

The JIT interface needs to allocate storage to hold the generated code. It does so by calling the following routine that must be provided by the client, and in simple applications can be defined in terms of malloc.

void *vm_alloc(int size);

Example

Here's a procedure that generates a simple factorial function containing a loop:

typedef int (*funcp)(int);

funcp compile(void) {
     code_addr entry, lab1, lab2, loc1, loc2;
     int arg;

     entry = vm_label();
     vm_prolog(1);
     arg = vm_arg();
     vm_gen2ri(GETARG, xV0, arg);
     vm_gen2ri(MOV, xV1, 1);

     lab1 = vm_label();
     loc1 = vm_gen3rib(BEQ, xV0, 0, NULL);
     vm_gen3rrr(MUL, xV1, xV1, xV0);
     vm_gen3rri(SUB, xV0, xV0, 1);
     vm_gen1b(JUMP, lab1);

     lab2 = vm_label();
     vm_gen2rr(MOV, xRET, xV1);
     vm_gen0(RET);
     loc2 = vm_label();

     vm_patch(loc1, lab2);
     vm_flush(entry, loc2);

     return (funcp) entry;
}

Let's go through that line by line in true 'computers for dummies' style.

     entry = vm_label();
     vm_prolog(1);
     arg = vm_arg();

The variable entry holds the location of the function's entry point. The function has one argument (vm_prolog(1)), and arg is set to the offset of that argument. (Whether the argument will be passed in memory or in a register is hidden from us.)

     vm_gen2ri(GETARG, xV0, arg);
     vm_gen2ri(MOV, xV1, 1);

These are the first two instructions we generate: the first one moves the argument from wherever we receive it into register xV0; the second sets register xV1 to 1.

     lab1 = vm_label();
     loc1 = vm_gen3rib(BEQ, xV0, 0, NULL);

Now we compile "lab1: if xv0 = 0 goto lab2", where L2 is a label somewhere below. We can save the result of vm_label() as the address lab1 easily enough, but we don't know the address lab2 yet, so we use NULL instead for the target of the branch, and save the location loc1 for later backpatching. (Note that lab1 and loc1 are different, because lab1 labels the start of the branch instruction, and loc1 is the place within it where the target address must be patched.)

     vm_gen3rrr(MUL, xV1, xV1, xV0);
     vm_gen3rri(SUB, xV0, xV0, 1);
     vm_gen1b(JUMP, lab1);

If xV0 is non-zero, we multiply xV1 by its value, decrease the value by 1, then jump back to try the test again. Note that the backward jump can have its target specified directly. Of course, it would be better to avoid the second jump in the loop by using a layout where the test came at the end, but I have avoided that for simplicity of the example.

     lab2 = vm_label();
     vm_gen2rr(MOV, xRET, xV1);
     vm_gen0(RET);
     loc2 = vm_label();

The expected label lab2 is placed here, where we want to return the value in xV1 as the result. Just before the RET instruction, we move the value into the special register xRET. Location loc2 is the end of the code for the function.

     vm_patch(loc1, lab2);
     vm_flush(entry, loc2);

To finish things off, first we patch the label lab2 into the branch instruction whose address loc1 we saved earlier. Then we request that the range of addresses from entry to loc2 be made executable.

     return (funcp) entry;

The compiling function returns the label entry as its result, after casting it to the type funcp of pointers to functions from int to int.

A simple main program illustrates the creation and calling of the dynamic function:

int main(int argc, char *argv[]) {
     funcp fp = compile();
     int n = atoi(argv[1]);
     printf("The factorial of %d is %d\n", n, (*fp)(n));
     return 0;
}

The one missing piece is the callback function vm_alloc that the JIT library uses to allocate memory.

void *vm_alloc(int size) {
     return malloc(size);
}

Another example

Now let's compute factorials with a recursive function instead:

funcp compile2(void) {
     code_addr entry, lab1, loc1, loc2;
     int arg;

     entry = vm_label();
     vm_prolog(1);
     arg = vm_arg();
     vm_gen2ri(GETARG, xV0, arg);

     loc1 = vm_gen3rib(BNEQ, xV0, 0, NULL);
     vm_gen2ri(MOV, xRET, 1);
     vm_gen0(RET);
     
     lab1 = vm_label();
     vm_gen3rri(SUB, xR0, xV0, 1);
     vm_gen1i(PREP, 1);
     vm_gen1r(ARG, xR0);
     vm_gen1i(CALL, (int) entry);
     vm_gen1r(RETVAL, xR1);
     vm_gen3rrr(MUL, xRET, xV0, xR1);
     vm_gen0(RET);

     loc2 = vm_label();

     vm_patch(loc1, lab1);
     vm_flush(entry, loc2);

     return (funcp) entry;
}

Again, here's a blow-by-blow account:

     entry = vm_label();
     vm_prolog(1);
     arg = vm_arg();
     vm_gen2ri(GETARG, xV0, arg);

We put the incoming argument in register xV0.

     loc1 = vm_gen3rib(BNEQ, xV0, 0, NULL);
     vm_gen2ri(MOV, xRET, 1);
     vm_gen0(RET);

If the argument is zero, then we return 1 immediately.

     lab1 = vm_label();
     vm_gen3rri(SUB, xR0, xV0, 1);
     vm_gen1i(PREP, 1);
     vm_gen1r(ARG, xR0);
     vm_gen1i(CALL, (int) entry);
     vm_gen1r(RETVAL, xR1);

Otherwise, we want to compute the expression n * fact(n-1). First, we compute in xR0 the argument n-1 for the recursive call. Then comes the call itself: the PREP insruction specifies the number of arguments, and it is followed by the appropriate number of ARG instructions, giving the arguments from right to left. The call is completed by the CALL instuction, in which the address of the routine being called may be given as a register or an immediate. The RETVAL instructions fetches the result of the recursive call into a specified register; the result may have been returned in memory, or in a register that may overlap with a named register, so the RETVAL instruction should come before any other instruction that uses a register.

     vm_gen3rrr(MUL, xRET, xV0, xR1);
     vm_gen0(RET);

The value of n in xV0 has been preserved by the call, so to finish we multiply it by the result of the recursive call, putting the answer directly into the result register xRET.

     loc2 = vm_label();
     vm_patch(loc1, lab1);
     vm_flush(entry, loc2);
     return (funcp) entry;

We finish by fixing up the label for the conditional branch as before and making the code executable.