Design overview for OBC

From Spivey's Corner
Jump to: navigation, search

[Some of the details given here still need updating]

The Keiko Abstract Machine

The Oxford Oberon--2 compiler translates source programs into code for a stack-based abstract machine, and comes with an assembler/linker and a runtime system that together implement the abstract machine using bytecode. The runtime system also contains implementations of the primitive routines in the Oberon library, and a garbage collector that manages the Oberon heap.

This document describes the abstract machine primarily from the point of view of someone using it as the target of a compiler. For this purpose, the instructions of the machine can be divided into two groups: a core set of instructions that provide access to all the machine's functions, and an extended set that abbreviate common sequences of core instructions. For example, there is a single extended instruction LDLW 16 that pushes onto the evaluation stack a word from offset 16 in the stack frame of the current procedure. This instruction has the same effect as the two core instructions LOCAL 16/LOADW. These extended instructions are useful in a bytecode implementation, because they carry out the same work as the sequence of core instructions, but incur a lower overhead of fetch/execute cycles in the bytecode interpreter. In a compiler, a reasonable approach is first to translate the high-level program into core instructions, then use a peephole optimiser to combine these into extended instructions whenever possible.

The output of the compiler is a text file, and it is the job of the assembler/linker to gather a group of these text files, corresponding to the modules in the program, and combine them into a single binary file in the format recognised by the bytecode interpreter. This process is slightly complicated by the fact that some instructions have multiple encodings: for example, common instructions like LDLW 12 and LDLW 16 (which happen to access the first two parameters of the current procedure) may be encoded as a single byte, using two of the 256 possible bytecodes. On the other hand, less common instructions such as LDLW 100 or LDLW 1000 would be encoded as a one-byte opcode followed by one or two argument bytes. The compiler need not be concerned with the encoding of instructions as bytecode, since it outputs all these variants of the LDLW instruction as text in the same format.

In actual fact, some of the less common instructions described in this document have {\em no\/} encoding in the bytecode implementation, and are replaced by the assembler with longer sequences of instructions. For example, there is no bytecode for the floating-point jump on less-than-or-equal instruction FJLEQ lab, and it is replaced by the assembler with the longer sequence FCMP/JLEQZ lab that uses the floating-point compare instruction FCMP, followed by a conditional jump that compares the result with zero. Such replacements allow the 256 possible bytecodes to be used for more common instructions, whilst maintaining the appearance of a nicely regular instruction set for the compiler.

Machine architecture

Here I will summarize the 'programming model' of the Keiko machine by describing the execution context, the layout of memory and what registers the machine has. At each moment during execution, the machine is running a certain procedure in the context established by a current stack frame. Each bytecode procedure consists of two parts: a sequence of bytecodes for the procedure body, and a procedure descriptor, part of which is a constant pool that gives the values of constants that are used by the bytecode. Expressions are translated into a postfix form by the compiler, and during evaluation of an expression, temporary values are held on an evaluation stack.

In addition to the contents of the evaluation stack, which is analogous to the general-purpose registers of a conventional machine, the Keiko machine has four specialised registers that allow access to the execution context:

  • The program counterA register that contains the address of the next instruction to be executed. Because of pipelining, on ARM Cortex-M machines, reading the program counter yields a value that is 4 bytes greater than the address of the current instruction. pc points to the next bytecode instruction to be executed. As the program is executed, the pc advances from one instruction to the next.
  • The context pointer cp points to the procedure descriptor for the current procedure. The bytecode can refer to values in the constant pool that are found at fixed offsets from~cp.
  • The base pointer bp points to a fixed location in the current stack frame. Parameters and local variables of the current procedure can be accessed at fixed offsets from~bp.
  • The stack pointera register @sp@ that holds the address of the most recent occupied word of the subroutine stack. On ARM, as on most recent processors, the subroutine stack grows downwards, so that the @sp@ holds the lowest address of any occupied work on the stack. sp points to the top of the evaluation stack. Instructions for evaluating expressions access values through~sp.

The first few words of the procedure descriptor for a procedure contain all the information that is needed to call it: in particular, they specify whether the procedure is implemented in interpreted bytecode or by a native-code subroutine, and if it is interpreted, where the bytecode is. This means that a procedure can be represented in the program by the address of its descriptor, and when a procedure is defined using the PROC directive, it is this address that becomes the definition of the procedure's name as an assembler symbol. The value of that symbol may subsequently become part of the constant pool for other procedures that call it.

The first words of the procedure descriptor have the following layout:

  1. Address of a native-code routine to call. For a bytecode procedure, this is the bytecode interpreter interp, and for a native procedure, it is the procedure body itself.
  2. For a bytecode procedure, the address of the bytecode.
  3. Size of the local variable area in the stack frame.
  4. A pointer map of the stack frame for use by the garbage collector.
  5. If non-null, the address of a table of |(pc, map)| pairs giving a map of the evaluation stack at each procedure call within the procedure body. This table is compiled by the assembler from STKMAP directives that appear in the procedure body, and is also used by the garbage collector.

These five words are laid out by the assembler in response to the PROC directive that begins the procedure. The procedure descriptor for a native procedure is laid out by the PRIMDEF directive: more about native procedures is given in \longref{Section}{s:native}.

The memory is divided into five areas:

  • The {\em global data segment\/} contains global constants and variables. Part is initialized with string constants and other static data (including procedure descriptors) when the program is loaded, and the rest (the `bss' area) is allocated and initialized to zero when the program starts.
  • The {\em code segment\/} contains bytecode for each procedure body.
  • The {\em subroutine stack\/} contains the stack frames for active procedure calls. Each stack frame contains a frame head with saved register values from the caller.
  • The {\em evaluation stack\/} contains temporary values during the evaluation of expressions. In practice, the evaluation stack occupies locations between stack frames in the subroutine stack.
  • The {\em heap area\/} contains storage that is allocated with the NEW primitive and managed by a gargbage collector.

The machine does not provide a static link as part of the frame head; instead, the static link of a local procedure is treated as an extra argument. The Oberon compiler does not pass a static link to global procedures, and adds the static link for local procedures as an implicit argument that is pushed after the explicit arguments and becomes accessible at offset 12 in the stack frame. This distinction in calling conventions is supported by the fact that Oberon does not allow local procedures to be used as procedure values.

The garbage collector depends on having an accurate map showing where pointers may be stored in the global data segment, stack frames, the evaluation stack and the heap. It is the responsibility of the compiler to make these maps; among other things, each record that is allocated in the heap has a descriptor that shows where pointers are stored in the record.

Program structure

Each program consists of a number of files of Keiko code corresponding to modules, and each file begins with a header that identifies the module and the other modules it imports. For example, here is the header for a module Foo that imports the library module Out:

MODULE Foo 0x33d72171 0
IMPORT Out 0x16f3ac22
ENDHDR

The linker checks that the module Out is loaded before Foo; it also checks that the checksum 0x16f3ac22 for Out given here matches the checksum given in that module. The compiler outputs checksums for each module that depend on the exported interface of the module, thus ensuring that clients of a module are recompiled if its interface changes. (The integer 0 in the MODULE directive indicates that Foo was not compiled for line-count profiling.)

The assembler/linker has a mechanism (not described here) for searching a directory of library modules and selecting those that are needed to satisfy IMPORT directives in a program, including also those that are imported transitively. The header of each file is used to do this also.

After the header, each file of code can contain several procedures, interspersed with directives that create constants and reserve space for global variables. Blank lines are ignored, and lines beginning with an exclamation mark are treated as comments: to aid in understanding the code that is generated by the Oberon compiler, it reproduces each source line as a comment in its output, next to the code that was generated from it. The remaining lines must each contain one instruction or directive, consisting of one or more blank-separated `words'. The first word of each line is the instruction name, and it determines both the number of other words that should be present and how they are interpreted.

Each procedure begins with a PROC directive. This is followed by some other directives that lay out the constant pool of the procedure, by which it can refer to other procedures and global variables. Next come the instructions that make up the procedure body, and finally an END directive. For example, supposed a module named Baz contains the Oberon procedure MyCos, defined by

PROCEDURE MyCos(x: REAL): REAL;
BEGIN
  RETURN Math.Sin(x + Math.pi/2)
END MyCos;

This produces the following object code:

PROC Baz.MyCos 0 0
FLOAT 1.57079632679489655800e+00
WORD Math.Sin
! PROCEDURE MyCos(x: REAL): REAL;
!   RETURN Math.Sin(x + Math.pi/2)
LDLW 12
CONSTW 0
FPLUS
PUSH 0
CONSTW 1
CALLW 1
RETURN
END

Here the compiler has extracted the constants π/2 = 1.570796... and Math.Sin and made them into a constant pool for the procedure. These constants are used in the procedure body via the instructions CONSTW 0 and CONSTW 1.

Three kinds of symbols are recognised by the assembler:

  • Global symbols are used to name procedures, constants and global variables. The scope of a global symbol is the entire program being assembled and linked, and symbols may be defined in one file of code and referenced in another. The Oberon compiler makes sure that symbols are unique across the whole program by making each symbol begin with the name of the module that defines it. Global symbols are defined by the DEFINE, GLOBAL, PROC and PRIMDEF directive.
  • Primitive names are used to link the Oberon program with the machine-language primitives that it calls. The linker makes a table of primitives, and this table is used to resolve references to primitives when the program is loaded. Primitive names are used only in PRIM and PRIMDEF directives.
  • Local labels within a procedure have names that are small integers. The scope of a local label is just the procedure in which it appears (although, in point of fact, the compiler does not re-use labels). Labels are defined by LABEL directives within a procedure, and used in branch instructions such as JUMP, JUMPT and JEQ.

Files of code that are output by the Oberon compiler begin with a block of comment lines that describe the exported interface of the module. This part of the file is read back by the compiler when compiling other modules that import it, but is ignored by the assembler. The body of an Oberon module is translated by the compiler into a procedure with a name like Foo.%main, and the linker generates a master program that calls each of these procedures in sequence.

Example program

See another page for an example program.

Instruction set

Details of the Keiko instruction set are on another page.

Assembler directives

In addition to the instructions listed in the preceding section, input files for the assembler contain directives that define procedures, place labels, and lay out global storage.

Template:Instructions The remaining directives are used in module headers, to declare that a module imports others, or to declare that it defines a primitive. Template:Instructions

Native-code primitives

Certain procedures in the Oberon library are implemented as native-code primitives, with bodies written by hand in C. It is also possible to generate native-code procedures by translating the body of a bytecode procedure, and we consider the requirements for such a translator in Section~\ref{s:jit} below.

The C interface for native procedures uses a type value that is the union of various primitive types. It is defined in the header file obx.h:

typedef unsigned char uchar;

typedef union {
     int i;
     float f;
     value *p;
     uchar *x;
};

This type permits us to use C as a kind of portable assembler, since a variable of type value can be used to store an integer, a float, or a pointer – all in the same storage space. Values are always aligned on a 4-byte boundary.

Double-precision floating-point numbers are stored in the Keiko machine as two adjacent values. The two halves are kept in little-endian order, even if the host machine is big-endian; also, these values are not necessarily aligned on an 8-byte boundary. This means that in practice, double-precision values must be loaded and stored as two separate words. The Oberon runtime system provides two functions

PUBLIC double get_double(value *v);
PUBLIC double put_double(value *v, double x);

for loading and storing double-precision values in a machine-independent way.

As an example of a library procedure, let's define a procedure Sin, which computes the sine of its floating-point argument using the sin routine from the C library; essentially the same procedure is already provided by the standard Math module. In order to make this new primitive available in the rest of an Oberon program, it must be declared at the Oberon level; that is the purpose of the following declaration, which might appear in the module \fn{MyMath.m}:

PROCEDURE Sin(x: REAL): REAL IS "MySin";

This declaration declares Sin as a procedure, and announces that its body will be the C function MySin. The procedure Sin may be used in the Oberon code exactly as if it were an ordinary procedure written in Oberon. The procedure could also be exported from the MyMath module: then other modules that call |MyMath.Sin| would do so in exactly the same way as if it were an ordinary bytecode procedure.

The Oberon compiler generates the following single line of code for Sin:

PRIMDEF MyMath.Sin MySin 0 0

This declares to the linker that |MyMath.Sin| is a procedure whose body is the native-code primitive MySin. The two zeroes indicate that no local variable space need be allocated in the stack frame, and the stack frame contains no pointers that need tracing by the garbage collector.

In order to succeed in linking programs that use the |MyMath.Sin| procedure, we must provide a native-code body, perhaps in a file \fn{myprims.c}, like this:

#include "obx.h"
#include <math.h>

value *MySin(value *cp, value *sp) {
     value *bp = sp;
     value *R = --sp;

     (*R).f = sin(bp[3].f);
     return sp;
}

This routine receives as arguments the address of the descriptor for |MyMath.Sin| and the current stack pointer. It begins by saving the stack pointer as a local variable bp, and allocating a space named R to receive the result. Next, it accesses the first parameter (at offset 3 words from bp) as a floating-point number |bp[3].f|, and passes this number to the C library function sin. The result is stored as a floating-point number in the result slot R. Finally, the new stack pointer sp is returned as result.

When assembling a program that declares new primitives, the assembler outputs a files of C code, containing a table of primitives used in the program. That file of code is then compiled and linked with the runtime system and the code for the primitives, and is used when the program is loaded to identify the native-code procedure corresponding to each primitive. If the linker decides (for example) that Sin will be primitve number 37, then the first word of the descriptor for Sin will contain the integer 37, and the 37th entry on the primitive table will be the address of the C function MySin. When the runtime system loads the program, it fixes each procedure by looking up the first word of its descriptor in the table of primitives, and replacing it with the value found there. For Sin, this will be the address of the C function MySin, so that is the code that will run when Sin is called. (For bytecode procedures, the first word of the descriptor is 0, and primitive 0 is always the bytecode interpreter.)

The whole program can be compiled like this:

$ _obc -C -o mymath MyMath.m myprims.c -lm_

The flag -C asks for custom-mode linking, where a table of primitives actually used in the program is generated as a by-product of linking. Here are the steps that this command carries out (as you could see by saying ``obc -v ...:

/usr/mike/lib/obc/obc1 -I /usr/mike/lib/obc MyMath.m >MyMath.k
gcc -c -I /usr/mike/lib/obc -O2 myprims.c -o myprims.o
/usr/mike/lib/obc/oblink -custom -script lscript \
    -L /usr/mike/lib/obc MyMath.k -o /tmp/obc3818.out >/tmp/obc3818.c
gcc -I /usr/mike/lib/obc /tmp/obc3818.c myprims.o \
    /usr/mike/lib/obc/obx.a /usr/mike/lib/obc/oblib.a \
    -lm -ldl -rdynamic -o mymath
strip mymath
cat /tmp/obc3818.out >>mymath

The first steps are to compile the Oberon code \fn{MyMath.m} into the bytecode file \fn{MyMath.k} and the C code \fn{myprims.c} into the object file \fn{myprims.o}. Next, \fn{MyMath.m} is linked with the Oberon standard library to produce the binary bytecode \fn{/tmp/obc207.out} and the primitive table \fn{/tmp/obc207.c}. As part of the linking process, the linker has assigned a numeric index to each primitive, and the primitive table maps each index to the corresponding C routine for the primitive. The final steps are to link together the primitive table, the object code for our new primitive, and the bytecode interpreter \fn{obx.a}, to get an executable \fn{mymath}, then to add the binary bytecode to the end of the executable. Similar steps are used to build any program that contains native-code procedures.

Garbage collection

When the runtime system can be built, one of two garbage collectors can be incorporated. The B\"oehm collector is a general-purpose conservative garbage collector designed to be used with C programs. It is conservative in the sense that it uses no map showing where pointers are stored, and although it incorporates sensitive tests to distinguish between words that are pointers and words that are ordinary integers (say), nevertheless it cannot always tell the difference, and sometimes items on the heap must be retained even when they are no longer accessible from the program. For similar reasons, the B\"oehm collector cannot compact storage, since it cannot be certain to identify the storage locations that contain pointers which should be updated. Nevertheless, the B\"oehm collector has the big advantage that it doesn't depend on pointer maps.

There is also an Oberon-specific garbage collector that can be enabled as part of the runtime system itself. This collector does use pointer maps generated by the compiler, and since it relocates heap objects and uses the pointer maps to update pointers to them, it will cause the program to crash if any pointer map is even slightly inaccurate. In compensation for this, the collector is much simpler than the B\"oehm collector, has the potential to be much faster (though this has not been measured so far), and can improve virtual memory performance by compacting the heap onto fewer pages.

Pointer maps may be in one of two formats. The simpler format is a bitmap, showing what words in a block of up to 31 words contain pointers; but there is also a more complex encoding that can describe in a compact way things like arrays of identical records, and can also describe objects like variable-sized arrays of pointers passed by value. I will describe only the bitmap format here, since that is what almost all pointer maps use. A bitmap consists of a final 1 bit (to make the bitmap appear as an odd integer) following 31 bits of data. Other pointer maps are encoded as the address of a table of pointer information, and addresses are assumed to be always a multiple of 4, so the final bit serves to distinguish a bitmap from the other kind of pointer map.

A bitmap is interpreted with respect to a base address. For a record, for example, the base address is the address of the first field, immediately after the descriptor. For a stack frame, the base address is $\hbox{\sci frame\_shift} = 16$ words before the base pointer, so that the bitmap can cover 16 words of local variables as well as 12 words of parameters beyond the 3-word frame head. For an evaluation stack map, the base address is the first parameter word of the callee, at $"bp"+3$ words.

Evaluation stack maps are needed only when pointer values remain on the stack during a call to another procedure; this is very rare outside contrived test cases, but an example is the procedure Build from the compiler test case \fn{tFibTree2.m}; it is defined by% \footnote{I've suppressed the base case $n=0$ to keep the example simple.}

PROCEDURE Build(n: INTEGER): tree;
BEGIN
  RETURN Cons(Build(n-2), Build(n-1))
END Build;

Here Cons, a procedure with heading

PROCEDURE Cons(l, r: tree): tree;

takes two pointer arguments and returns another pointer. The code for Build is as follows:

PROC tFibTree2.Build 0 0
WORD tFibTree2.Build
WORD tFibTree2.Cons
! PROCEDURE Build(n: INTEGER): tree;
!   RETURN Cons(Build(n-2), Build(n-1))
LDLW 12
DEC
CONSTW 0
CALLW 1
LDLW 12
PUSH 2
MINUS
CONSTW 0
STKMAP 0x00000005
CALLW 1
CONSTW 1
CALLW 2
RETURN
END

Lines 6–9 are the call Build(n-1), and lines 10–15 are the call Build(n-2). During this second call, the pointer returned by the first call remains on the evaluation stack of the outer invocation of Build, and the garbage collector needs to know about this in case it is called into action during the execution of the call. That is the purpose of the STKMAP directive on line 14; it gives the fact that beginning with the first parameter word of the inner invocation of Build, there is a pointer value in the second word. The assembler does not assemble any instruction for the STKMAP directive, but puts together a table in which the garbage collector looks up the pc value of every stack frame.

ML representation of instructions

The Oberon compiler has an internal representation for Keiko machine instructions that is convenient for translating and optimising programs. Other compilers written in OCaml could use the same representation and share the same peephole optimiser and code output module.

Internally, the compiler represents an instruction like PLUS as |Binop (IntT, Plus)|. All the binary operators are encoded as instances of Binop, so it is easy to write code that treats all binary operators alike. Integer subtraction (the MINUS instruction) is |Binop (IntT, Minus)|, while floating-point addition (FPLUS) is |Binop (FloT, Minus)|.

In detail, the type inst of instructions is defined in module Icode as follows:

(* icode -- type of intermediate instructions *)
type icode =
    PUSH of int32		(* Push constant (value) *)
  | LOCAL of int		(* Push address (offset) *)
  | LOAD of int			(* Load (size) *)
  | STORE of int		(* Store (size) *)
  | CONST of int * int		(* Load constant (offset, size) *)
  | FIXCOPY			(* Copy multiple values *)
  | FLEXCOPY			(* Copy open array param *)
  | DUP of int			(* Duplicate n'th value on stack (n) *)
  | POP of int			(* Pop a value (size) *)
  | SWAP			(* Swap top two values on stack *)
  | CALL of int * int           (* Call proc (nparams, res size) *)
  | RETURN 			(* Return from procedure *)
  | MONOP of kind * op		(* Unary operation (type, op) *)
  | BINOP of kind * op		(* Binary operation *)
  | CONV of kind * kind		(* Type conversion *)
  | ALIGN of int		(* Align parameter (size) *)
  | BOUND of int		(* Array bound check (line) *)
  | NCHECK of int		(* Check for null pointer (line) *)
  | ZCHECK of kind * int	(* Check for zero divisor (line) *)
  | ERROR of symbol * int	(* Runtime error (kind, line) *)
  | JUMP of codelab		(* Unconditional branch (dest) *)
  | JUMPB of bool * codelab	(* Jump on boolean *)
  | JUMPC of kind * op * codelab  (* Cond. branch (type, cond, dest) *)
  | JCASE of codelab list       (* Case jump *)
  | JRANGE of codelab		(* Range jump *)
  | TYPETEST of int		(* Type test (level) *)

  | PROC of symbol * int * literal  (* Procedure (label, frame, gcmap) *)
  | PRIMDEF of symbol * string * int * literal
				(* Primitive (label, prim, frame, gcmap) *)
  | END 			(* End of proc: shouldn't reach here *)
  | LABEL of codelab		(* Set code label *)
  | DEFINE of symbol		(* Label for record descriptor *)
  | STRING of string		(* String in data space (text) *)
  | GLOBAL of symbol * int	(* Global variable *)
  | WORD of literal		(* Data word (value) *)
  | FLOAT of float		(* Float constant (value) *)
  | DOUBLE of float		(* Double constant (value) *)
  | MODULE of ident * int * int	(* Module header (name, stamp, line count) *)
  | IMPORT of ident * int	(* Module import (name, stamp) *)
  | PRIM of string		(* Declare primitive *)
  | ENDHDR			(* End of definitions *)
  | STKMAP of literal		(* Stack map *)
  | LINE of int			(* Line number *)
  | COMMENT of string		(* Comment *)
  | BLANK			(* Blank line *)

  | INDEX of int		(* PUSH s/BINOP Times/BINOP PlusA *)
  | LDL of int * int		(* LOCAL n/LOAD s *)
  | STL of int * int		(* LOCAL n/STORE s *)
  | LDG of int * int		(* GETK n/LOAD s *)
  | STG of int * int		(* GETK n/STORE s *)
  | LDI of int			(* INDEX s/LOAD s *)
  | STI of int			(* INDEX s/STORE s *)
  | LDNW of int			(* PUSH n/LDI 4 *)
  | STNW of int			(* PUSH n/STI 4 *)
  | LDEW of int			(* LDLW 12/LDNW n *)
  | STEW of int			(* LDLW 12/STNW n *)
  | JUMPCZ of kind * op * codelab  (* PUSH 0/JUMPC *)
  | TESTGEQ of codelab		(* Case split = DUP 1/JUMPC Lt *)

Three auxiliary types are used for the operands of instructions. The type literal is also defined in Icode:

(* literal -- numeric or symbolic values *)
type literal =
    INT of int32
  | HEX of int32
  | SYM of symbol

Literals are used principally as operands for the WORD directive: they can be either integers in decimal or hex, or symbolic names. Literals are also used for pointer maps, which are either bitmaps (usually shown in hex) or symbolic addresses of pointer tables.

Personal tools

Variants
Actions
Navigation
Tools