[Template fetch failed for http://spivey.oriel.ox.ac.uk/corner/Template:Sitenotice?action=render: HTTP 404]

Keiko assembly language

From Compilers
Jump to navigation Jump to search

Syntax

This section gives the syntax of Keiko assembly language programs in the form that is accepted by the bytecode assembler/linker oblink. The style of syntax description is similar to that used in the Kernighan & Ritchie book on C: a syntactic category is followed by a sequence of alternatives, each on a separate line. A subscript opt indicates that a construct is optional.

Lexical conventions

  • Each element on its own line (nl used below to denote a line boundary)
  • Blank lines and lines beginning # are ignored
  • Identifiers can be any sequence of non-blank characters, including e.g. Files.Read. It's wise to avoid indentifoers that begin with a digit or minus sign, as in some contexts these may be interpreted as numeric constants.

The operands of instructions are described as instances of the class constant. In most contexts, constants may be specified in decimal or hexadecimal (as in 0x1234abcd), or may be symbols defined elsewhere in the program.

constant:
    decimal-constant
    hexadecimal-constant
    ident

Files

A Keiko file contains a heading that gives the name of the module and lists (in IMPORT directives) other modules that it depends upon. A compiler that outputs Keiko code can generate a checksum for the public interface of a module and embed this checksum in each other module that uses it, and the assembler/linker will then check across all modules in a program that the checksums are consistent. Unused checksums can be replaced by 0. The module header also contains a count of source lines in the module that is used to allocate counters for line-count profiling; this too can be replaced by 0 of profiling is not going to be used on the program.

file:
    heading bodyopt
heading:
    module-directive importsopt endhdr-directive
module-directive:
    MODULE ident checksum linecnt nl
imports:
    import-directive
    import-directive imports
import-directive:
    IMPORT ident checksum nl
endhdr-directive:
    ENDHDR nl

The body of a module constists of multi-line procedures interspersed with other single-line directives that (among other things) allocate global storage.

body:
    phrase
    phrase body
phrase:
    directive
    procedure

Directives

Directives appear between the procedures of a program.

directive:
    DEFINE ident nl
    WORD constant nl
    LONG constant nl
    FLOAT float nl
    DOUBLE float nl
    STRING hex-string nl
    GLOVAR ident integer nl
    PRIMDEF ident ident type-string nl
  • a DEFINE directive defines a symbol at the current location in the data segment. That location is the address of any following data item created with another directive such as WORD, FLOAT or STRING.
  • the WORD, LONG, FLOAT and DOUBLE directives each contribute a numeric constant to the data segment, allowing global data tables to be initialised; the table can be accessed through a label defined by a preceding DEFINE directive.
  • a STRING directive contributes a sequence of characters, specified by a hexadecimal string, to the data segment. If a terminating null character is needed, then this should be included in the hex string. The length of the string is padded to a multiple of 4 bytes. When convenient, it is possible to build up a string in several parts by giving multiple successive STRING directives, provided the length of all but the last directive is a multiple of 4 to prevent padding.
  • a GLOVAR directive allocates space of a specified size in the bss segment, and defines a symbol with its address. The size is rounded up to a multiple of 4, so that the current location in the bss segment is always aligned.
  • a PRIMDEF directive declares a named primitive whose definition is a C subroutine. A directive such as PRIMDEF Math.sqrt sqrtf FF declares a primitive that will be named Math.sqrt in the Keiko program, and interfaces to the standard C library function sqrtf, which the type string FF describes as taking a single float argument and yielding a float result. Some implementations of Keiko are able to link dynamically to libaries containing C functions, and others require an interpreter containing the primitives to be compiled specially.

Procedures

Each procedure has a heading that gives its name and some other information. This is followed by a sequence of mingled Keiko machine instructions and pseudo-operations. The pseudo-operations typically assemble into an entry in the procedure's constant pool, together with an instruction that loads the constant onto the stack.

procedure:
    proc-directive bodyopt end-directive
proc-directive:
    PROC ident integer integer constant nl
body:
    element
    element body
end-directive:
    END nl
element:
    pseudo-operation
    instruction
  • A PROC directive begins a procedure. The three arguments are:
    • The size of the procedure's local variable space in bytes; this should be a multiple of 4.
    • The maximum number of values pushed on the evaluation stack during the procedure, counting most types as one value, but long integers and doubles as two values. This argument is not currently used by implementations of the Keiko machine, and can be repaced by zero; the only possible disadvatage in future Keiko implementations is that stack overflow not be detected promptly. At present, the stack overflow check leaves a generous margin of space for each procedure to use.
    • A garbage collector map for the stack frame. If the Kieko machine is built without the optional garbage collector, or if the stack frame of the procedure contains no pointers into the heap, then this argument can be zero. If garbage collection is enabled, then every procedure that stores pointers in its frame must have a garbage collector map, which will be either a bitmap expressed as a hexdecimal constant, or the address of a program written in a special mini-language that describes the layout of the frame. This mini-language is described elsewhere.
  • An END directive ends a procedure and can be followed by further material that appears between procedures.

Instructions

instruction:
    opcode operandsopt nl
operands:
    constant
    constant operands

Each instruction has an optional list of operands, which (depending on the instruction) can be integer constants, assembler symbols, and labels. Details of the instructions and what operands they take appear later in this document.

Pseudo-operations

These pseudo-operations should appear inside a Keiko procedure; most behave like intructions but also contribute additional information to the current procedure.

pseudo-operation:
    LABEL ident nl
    CONST constant nl
    GLOBAL ident nl
    FCONST float nl
    DCONST float nl
    QCONST constant nl
    STKMAP constant nl
    LINE integer nl
  • The LABEL pseudo-op defines its argument to as a label for the next instruction in the procedure. Labels can be arbitrary identifiers and have a scope that is the whole of the current procedure. They are used only in branch instructions, and do not have a value that can be stored in a variables.
  • The next few pseudo-ops act as instructions that push a value on the stack, but are capable of handling 32-bit or 64-bit values that are stored out of line in the constant pool for the procedure. CONST pushes an arbitrary integer or address; GLOBAL is similar, but retricted to the addresses of globals; FCONST, DCONST and QCONST push float, double and long integer constants respectively, with the double and long integer constants taking up two stack slots. These pseudo-ops are typically translated by the Keiko assembler into LDKW or LDKD instructions that reference a slot it has allocated in the constant pool. As a special case CONST pseudo-ops that contain a small constant are translated into PUSH instructions that use an inline constant, either encoded directly in the opcode byte, or following it as the next one or two bytes of the instruction stream. All this is hidden from programmers and compilers by the Keiko assembler.
  • The STKMAP pseudo-op specifes a pointer map for the evaluation stack that holds at an immediately following CALL instruction. Any pointer values on the evaluation stack that are used as arguments to the procedure call will be covered by the procedure's own stack map, so this pseudo-op is needed only in the rare case where other values near the bottom of the evaluation stack will persist over the call. These stack maps are gathered for the whole procedure and used by the assembler to compile a stack map table that – alongside the code and the constant pool – forms part of the runtime representation of the procedure. If the Keiko machine is built without a garbage collector, then naturally enough these stack maps can be omitted.
  • The LINE pseudo-op marks a source line, with an argument that is the line number. It adds the line number to a table that the assembler includes with the object program, and also generates an LNUM instruction in the code. The LNUM instructions are used both by the Keiko profiler, which can count how many times each line is executed, and by debuggers, which can replace them with BREAK insructions to implement breakpoints.

Load and store instructions

These instructions are named according to a convention where the last letter of the mnemonic identifies the size and type of data: W for a 4-byte word, but also C for a byte, S for a 2-byte halfword, F for a single-precision floating point number, D for double-precision floating point, and Q for an 8-byte integer. Single bytes are treated as unsigned, but halfwords are sign-extended on loading, in agreement with the CHAR and SHORTINT types of the Oberon language for which Keiko was originally designed. The distinction between integer and floating-point values is made primarily to help very simple JIT transalators with register allocation.

LOCAL n
Push the address of a local at a constant offset (positive or negative) from the frame pointer.
OFFSET
Expect an address and an integer offset on the stack; pop them, add them, and push the result.
INDEXS, INDEXW, INDEXD
Similar to OFFSET, except that the integer offset if multiplied by 2, 4, or 8 respectively before the addition.
LOADW
Expect an address on the stack; pop it, and push the 4-byte contents of the address.
LOADS, LOADC, LOADF
Like LOADW, except that the value loaded is a 2-byte signed integer (LOADS), a single unsigned

byte (LOADC), or a single-precision float (LOADF).

LOADD, LOADQ
Like LOADW, except that the value loaded is a double-precision float (LOADD) of an 8-byte integer (LOADQ). Each of these types takes up two slots on the evaluation stack. The two halves of each value are loaded separately, so that the values in memory need only 4-byte alignment.
STOREW
Expect a 4-byte integer value and an address on the stack; pop them, and store the value at the address.
STORES, STOREC, STOREF
Like STOREW, except the value stored is a 2-byte integer (STORES), a single byte (STOREC) or a single-precision float (STOREF).
STORED, STOREQ
Like STOREW, except that the value stored is a double-precision float or an 8-byte integer, each of which occupies two slots on the evaluation stack. Again, only 4-byte alignment of the target address is required.

The addressing operators and load/store instructions listed above form a complete set, and the remaining instructions listed below are just shorthand for combinations of them. Compilers may combine the basic instructions into these shorthands, and implementations of Keiko may choose to implement some of the combinations directly, reducing the size of the binary code and speeding up the bytecode interpreter, which can achieve more in each cycle. Some of the rarer combinations may not in fact be implemented with their own bytecodes; for them, the assembler partially or completely expands the shorthands into their underlying primitive instructions. For example, the usual implementation of Keiko implements LDGF directly, in a single instruction that occupies 2 or 3 bytes, but re-expands LDGD x into the equivalent sequence GLOBAL x; LOADD, partly because the operation LOADD is already quite expensive, so the expense of forming the address in a separate instruction is proportionally less significant.

LDLW n, LDLS , LDLC n, LDLF n, LDLD n, LDLQ n
Load local, equivalent to LOCAL n followed by LOADW, LOADS, etc. Note that the signed offset n must fit into 2 bytes; otherwise the compiler generating the Keiko code must explicitly use the equivalent sequence LOCAL 0; CONST n; OFFSET; LOADW, etc.
STLW n, STLS n, STLC n, STLF n, STLD n, STLQ n
Store local, equivalent to LOCAL n; STOREW, etc. Again, the signed offset n must fit in 2 bytes.
LDGW x, LDGS x, LDGC x, LDGF x, LDGD x, LDGQ x
Load global, equivalent to GLOBAL x; LOADW, etc.
STGW x, STGS x, STGC x, STGF x, STGD x, STGQ x
Store global, equivalent to GLOBAL x; STOREW, etc.
LDNW n
Load indexed, equivalent to CONST n; OFFSET; LOADW, with signed offset n fitting in 2 bytes.
STNW n
Store indexed, equivalent to CONST n; OFFSET; STOREW, with signed offset n fitting in 2 bytes.
LDNS
LDNC
LDNF
LDND
LDNQ
LDIW
Double-indexed load, equivalent to CONST 4; TIMES; OFFSET; LOADW.
LDIS
LDIC
LDIF
LDID
LDIQ
STIW
Double-indexed store, equivalent to CONST 4; TIMES; OFFSET; STOREW.
STIS
STIC
STIF
STID
STIQ

Integer arithmetic

inst PLUS 0 B.i { $1.i + $2.i } { ibinop(ADD); } inst MINUS 0 B.i { $1.i - $2.i } { ibinop(SUB); } inst TIMES 0 B.i { $1.i * $2.i } { multiply(); } inst UMINUS 0 M.i { - $1.i } { imonop(NEG); }

inst AND 0 B.i { $1.i && $2.i } { ibinop(AND); } inst OR 0 B.i { $1.i || $2.i } { ibinop(OR); } inst NOT 0 M.i { ! $1.i } { push_con(1); ibinop(XOR); } inst INC 0 M.i { $1.i + 1 } {= PUSH 1, PLUS } inst DEC 0 M.i { $1.i - 1 } {= PUSH 1, MINUS } inst BITAND 0 B.i { $1.i & $2.i } { ibinop(AND); } inst BITOR 0 B.i { $1.i | $2.i } { ibinop(OR); } inst BITXOR 0 B.i { $1.i ^ $2.i } { ibinop(XOR); } inst BITNOT 0 M.i { ~ $1.i } { imonop(NOT); }

inst LSL 0 B.i { $1.i << $2.i } { lsl(); } inst LSR 0 B.i { rshu($1.i, $2.i) } { ibinop(RSHu); } inst ASR 0 B.i { $1.i >> $2.i } { ibinop(RSH); } inst ROR 0 B.i { ror($1.i, $2.i) } { ibinop(ROR); }

inst DIV 0 T2 { int_div(sp); } { callout(int_div,2,INT,1); } inst MOD 0 T2 { int_mod(sp); } { callout(int_mod,2,INT,1); }

inst EQ 0 B.i { $1.i == $2.i } { compare(EQ); } inst LT 0 B.i { $1.i < $2.i } { compare(LT); } inst GT 0 B.i { $1.i > $2.i } { compare(GT); } inst LEQ 0 B.i { $1.i <= $2.i } { compare(LE); } inst GEQ 0 B.i { $1.i >= $2.i } { compare(GE); } inst NEQ 0 B.i { $1.i != $2.i } { compare(NE); }

Miscellaneous operations

INCL n
Increment a local variable, equivalent to LDLW n; CONST 1; PLUS; STLW n. The offset n must fit in two bytes.
DECL n
The same as INCL, but decrementing instead of incrementing.
DUP k
Push on the top of the stack a copy of the (single-word) value that is k items from the top, where k is 0, 1, or 2.
SWAP
Swap the top two (single-word) values on the stack.
POP n
Pop n items from the stack, where n < 256.

Conditional and unconditional branches

JEQ lab, JNEQ lab, JLT lab, JGT lab, JLEQ lab, JGEQ lab
Expect two integers on the stack; pop and compare them, and branch to lab if the relevant condition is satisfied.
JEQZ lab, JNEQZ lab, JLTZ lab, JGTZ lab, JLEQZ lab, JGEQZ lab
Expect an integer on the stack; pop it and compare it with zero, and branch to lab if the relevant condition is satisfied.
JUMP lab
Jump to lab.

The next few instructions are intended for implementing case statements. The TESTGEQ instruction can be used to build a binary tree of comparisons involving a value k that remains on the stack. At the leaves of the tree, JCASE and JRANGE instructions permit the relevant case to be identified quickly.

JCASE n; CASEL lab1; ...; CASEL labn
A JCASE n instruction must be followed by a table of n labels written as operands of CASEL. The instruction expects an integer k on the stack; it pops the integer, and if 0 <= k < n branches to the corresponding case label; otherwise execution continues with the next instruction.
JRANGE lab
Expect three integers k, lo and hi on the stack; pop them and branch to lab if lo <= k <= hi.
TESTGEQ lab
Expect two integers k and x on the stack; pop x but leave k on the stack, branching to lab of k >= x.

Long integer operations

inst QPLUS 0 B.q { $1.q + $2.q } { qbinop(ADDq); } inst QMINUS 0 B.q { $1.q - $2.q } { qbinop(SUBq); } inst QTIMES 0 B.q { $1.q * $2.q } { qbinop(MULq); } inst QUMINUS 0 M.q { - $1.q } { qmonop(NEGq); } inst QDIV 0 T2q { long_div(sp); } { callout(long_div,2,INT,2); } inst QMOD 0 T2q { long_mod(sp); } { callout(long_mod,2,INT,2); }

equiv QINC 0 { PUSH 1, CONVNQ, QPLUS } equiv QDEC 0 { PUSH 1, CONVNQ, QMINUS }

inst QCMP 0 B.iqq { lcmp($1.q, $2.q) }

equiv QEQ 0 { QCMP, PUSH 0, EQ } equiv QLT 0 { QCMP, PUSH 0, LT } equiv QGT 0 { QCMP, PUSH 0, GT } equiv QLEQ 0 { QCMP, PUSH 0, LEQ } equiv QGEQ 0 { QCMP, PUSH 0, GEQ } equiv QNEQ 0 { QCMP, PUSH 0, NEQ }

equiv QJEQ R { QCMP, JEQZ $a } equiv QJLT R { QCMP, JLTZ $a } equiv QJGT R { QCMP, JGTZ $a } equiv QJLEQ R { QCMP, JLEQZ $a } equiv QJGEQ R { QCMP, JGEQZ $a } equiv QJNEQ R { QCMP, JNEQZ $a }

Floating point arithmetic

inst FPLUS 0 B.f { $1.f + $2.f } { fbinop(ADDf); } inst FMINUS 0 B.f { $1.f - $2.f } { fbinop(SUBf); } inst FTIMES 0 B.f { $1.f * $2.f } { fbinop(MULf); } inst FDIV 0 B.f { $1.f / $2.f } { fbinop(DIVf); } inst FUMINUS 0 M.f { - $1.f } { fmonop(NEGf); }

inst DPLUS 0 B.d { $1.d + $2.d } { dbinop(ADDd); } inst DMINUS 0 B.d { $1.d - $2.d } { dbinop(SUBd); } inst DTIMES 0 B.d { $1.d * $2.d } { dbinop(MULd); } inst DDIV 0 B.d { $1.d / $2.d } { dbinop(DIVd); } inst DUMINUS 0 M.d { - $1.d } { dmonop(NEGd); }

equiv FEQ 0 { FCMPL, PUSH 0, EQ } equiv FNEQ 0 { FCMPL, PUSH 0, NEQ } equiv FLT 0 { FCMPG, PUSH 0, LT } equiv FGT 0 { FCMPL, PUSH 0, GT } equiv FLEQ 0 { FCMPG, PUSH 0, LEQ } equiv FGEQ 0 { FCMPL, PUSH 0, GEQ }

equiv DEQ 0 { DCMPL, PUSH 0, EQ } equiv DNEQ 0 { DCMPL, PUSH 0, NEQ } equiv DLT 0 { DCMPG, PUSH 0, LT } equiv DGT 0 { DCMPL, PUSH 0, GT } equiv DLEQ 0 { DCMPG, PUSH 0, LEQ } equiv DGEQ 0 { DCMPL, PUSH 0, GEQ }

Floating point conditional branches

There are ten different conditional branches involving comparison of two single-precision floats (and another ten for double precision), because the treatment of anomalous NaN values means the jump-if-less-than-or-equal is different from jump-if-not-greater-than. The standard implementation of Keiko, like the JVM, implements these using one of two floating-point comparison instructions (with different treatment of NaN) followed by an integer conditional branch.

FJEQ lab, FJNEQ lab, FJLT lab, FJGT lab, FJLEQ lab, FJGEQ lab, FJNLT lab, FJNGT lab, FJNLEQ lab, FJNGEQ lab
Expect two single-precision floats on the stack; pop and compare them, and jump to lab if the condition is satisfied.
DJEQ lab, DJNEQ lab, DJLT lab, DJGT lab, DJLEQ lab, DJGEQ lab, DJNLT lab, DJNGT lab, DJNLEQ lab, DJNGEQ lab
Similar instructions, but with double-precision operands.

Conversions

A small but sufficient set of conversions is provided; other conversions must go via the integer type.

CONVNF
Expect an integer on the stack; pop it, convert it to a single-precision floating point approximation to the same value, and push the result.
CONVND
Convert an integer to double-precision floating point.
CONVFN
Convert a single-precision float to an integer, discarding the fractional part.
CONVDN
Convert a double-precision float to an integer, discarding the fractional part.
CONVFD
Convert from single-precision to double-precision floating point.
CONVDF
Convert from double-precision to single-precision floating point.
CONVNC
Convert an integer to an unsigned character by masking off all but the bottom 8 bits.
CONVNS
Convert an integer to a signed 2-byte integer by masking off all but the bottom 16 bits, then sign extending from 16 to 32 bits.
CONVNQ
Convert from a 32-bit to a 64-bit integer with sign extension.
CONVQN
Convert from a 64-bit to a 32-bit integer.
CONVQD
Convert a 64-bit integer to double-precision floating point.
CONVDQ
Convert a double-precision float to a 64-bit integer.

Runtime checks

BOUND line
Expect integers index and bound on the stack; pop bound, but leave index on the stack. If the relationship 0 <= index < bound is not satisfied, signal an array bound error on line.
NCHECK line
Expect a pointer on the stack; leave it there, but if it is null, signal a null pointer error on line.
GHCECK line
Expect a pointer on the stack; leave it there, but if it is non-null, signal an error involving the assignment of a local procedure to a procedure-valued variable.
ZCHECK line, FZCHECK line, DZCHECK line, QZCHECK line
Expect an integer or a single- or double-precision float or a long integer on the stack; leave it there, but i it is zero, signal a divide-by-zero error on line.
ERROR n line
signal error n on line. Predefined constants such as E_CAST allow the message to be chosen from a standard list: see the function message in source file xmain.c.

More bits and pieces

ALIGNC, ALIGNS
Expect a one-byte or two-byte quantity on top of the stack. Adjust its alignement in way that is appropriate if it is to become a procedure parameter. These operations are no-ops on little-endian architectures like x86 and ARM, but act as shifts on big-endian architectures.
FIXCOPY
Expect two pointers dst and src and a count n on the stack; pop them and copy n bytes from src to dst.
FLEXCOPY
Expect on the stack a pointer to the location in the stack frame where the address of a flexible array parameter is stored, and an integer giving its size in bytes; pop them, allocate space for the parameter in the stack frame of the current procedure, copy the data across, and replace the parameter address with the address of the copy.
LNUM n
Note the beginning of the code for source line n. This instruction has no effect normally, but can be used for line-count profiling and to support breakpoints in a debugger. The value n must fit in 2 bytes.

Procedure call

The procedure call instructions provided by the standard Keiko machine are slightly different from the PCALL instruction used in the Compilers course: in fact PCALL n is equivalent to CALL (n+1), because the always-present static link is treated as an extra parameter. Other compilers targetting Keiko do not pass the static link as a parameter, but instead via a special 'secret' place. This makes it possible to global procedures to ignore any dummy static link they may be passed, and for calls to known global procedures to avoid passing a static link at all, saving time and space.

CALL n, CALLW n, CALLF n, CALLD n, CALLQ n
Call a procedure with n arguments and no result (CALL), a one-word result that may be an integer or a pointer (CALLW), a single-precision (CALLF) or double-precision (CALLD) floating point result, or a 64-bit integer result (CALLQ). The arguments should previusly have been pushed on the evaluation stack, with double-precision float and 64-bit integer arguments counting double, and followed by the procedure address. These arguments become part of the stack frame of the procedure, and the procedure address and the arguments are popped when the procedure returns.
STATLINK
Expect a pointer to a frame base on the stack; pop it and save it in a secret place. The STATLINK instruction should appear just before the code that pushes the procedure address for the call (and that code should not itself involve other procedure calls).
SAVELINK
This instruction must be the first in a procedure that expects a static link. It moves the link from the secret place to it proper location in the stack frame of the procedure.
RETURN