[Template fetch failed for http://spivey.oriel.ox.ac.uk/corner/Template:Sitenotice?action=render: HTTP 404]

Keiko assembly language: Difference between revisions

From Compilers
Jump to navigation Jump to search
 
(10 intermediate revisions by the same user not shown)
Line 123: Line 123:


;@LOADW@: Expect an address on the stack; pop it, and push the 4-byte contents of the address.
;@LOADW@: Expect an address on the stack; pop it, and push the 4-byte contents of the address.
;@LOADS@, @LOADC@, @LOADF@: Like @LOADW@, except that the value loaded is a 2-byte signed integer (@LOADS@), a single unsigned
;@LOADS@, @LOADC@, @LOADF@: Like @LOADW@, except that the value loaded is a 2-byte signed integer (@LOADS@), a single unsigned byte (@LOADC@), or a single-precision float (@LOADF@).
byte (@LOADC@), or a single-precision float (@LOADF@).
;@LOADD@, @LOADQ@: Like @LOADW@, except that the value loaded is a double-precision float (@LOADD@) of an 8-byte integer (@LOADQ@).  Each of these types takes up two slots on the evaluation stack.  The two halves of each value are loaded separately, so that the values in memory need only 4-byte alignment.
;@LOADD@, @LOADQ@: Like @LOADW@, except that the value loaded is a double-precision float (@LOADD@) of an 8-byte integer (@LOADQ@).  Each of these types takes up two slots on the evaluation stack.  The two halves of each value are loaded separately, so that the values in memory need only 4-byte alignment.


Line 137: Line 136:
;@LDGW x@, @LDGS x@, @LDGC x@, @LDGF x@, @LDGD x@, @LDGQ x@: Load global, equivalent to @GLOBAL x; LOADW@, etc.
;@LDGW x@, @LDGS x@, @LDGC x@, @LDGF x@, @LDGD x@, @LDGQ x@: Load global, equivalent to @GLOBAL x; LOADW@, etc.
;@STGW x@, @STGS x@, @STGC x@, @STGF x@, @STGD x@, @STGQ x@: Store global, equivalent to @GLOBAL x; STOREW@, etc.
;@STGW x@, @STGS x@, @STGC x@, @STGF x@, @STGD x@, @STGQ x@: Store global, equivalent to @GLOBAL x; STOREW@, etc.
;@LDNW n@: Load indexed, equivalent to @CONST n; OFFSET; LOADW@, with signed offset @n@ fitting in 2 bytes.
;@LDNW n@, @LDNS n@, @LDNC n@, @LDNF n@, @NDND n@, @LDNQ n@: Load indexed, equivalent to @CONST n; OFFSET; LOADW@, etc., with signed offset @n@ fitting in 2 bytes.
;@STNW n@: Store indexed, equivalent to @CONST n; OFFSET; STOREW@, with signed offset @n@ fitting in 2 bytes.
;@STNW n@, @STNS n@, @STNC n@, @STNF n@, @STND n@, @STNQ n@: Store indexed, equivalent to @CONST n; OFFSET; STOREW@, etc.,, with signed offset @n@ fitting in 2 bytes.
LDNS
;@LDIW@, @LDIS@, @LDIC@, @LDIF@, @LDID@, @LDIQ@: Double-indexed load, equivalent to @CONST 4; TIMES; OFFSET; LOADW@, etc., with a scale factor equal to the size of the value loaded.
LDNC
;@STIW@, @STIS@, @STIC@, @STIF@, @STID@, @STIQ@: Double-indexed store, equivalent to @CONST 4; TIMES; OFFSET; STOREW@, etc., with a scale factor equal to the size of the value being stored.
LDNF
LDND
LDNQ
 
;@LDIW@: Double-indexed load, equivalent to @CONST 4; TIMES; OFFSET; LOADW@.
LDIS
LDIC
LDIF
 
LDID
LDIQ
 
;@STIW@: Double-indexed store, equivalent to @CONST 4; TIMES; OFFSET; STOREW@.
STIS
STIC
STIF
 
STID
STIQ


==Integer arithmetic==
==Integer arithmetic==
 
;@PLUS@, @MINUS@, @TIMES@: Pop two integers from the stack, combine them with an arithmetic operation, and push the result.
inst PLUS 0 B.i    { $1.i + $2.i }            { ibinop(ADD); }
;@UMINUS@: Unary minus; pop an integer and push the integer with the same magnitude and opposite sign.
inst MINUS 0 B.i    { $1.i - $2.i }            { ibinop(SUB); }
;@DIV@, @MOD@: Integer division and modulo, defined with truncation towards minus infinity.
inst TIMES 0 B.i    { $1.i * $2.i }            { multiply(); }
;@INC@, @DEC@: Integer increment and decrement, equivalent to @CONST 1; PLUS@ or @CONST 1; MINUS@.
inst UMINUS 0 M.i  { - $1.i }                  { imonop(NEG); }
;@AND@, @OR@, @NOT@: Boolean operations; the integer arguments are interpreted as false if zero, true if non-zero, and the result is either 0 or 1.
 
;@BITAND@, @BITOR@, @BITXOR@, @BITNOT@: Bitwise logical operations.
inst AND 0 B.i      { $1.i && $2.i }            { ibinop(AND); }
;@LSL@, @LSR@, @ASR@, @ROR@: Shifts and rotations, with left shift (@LSL@), both logical (@LSR@) and arithmetic (@ASR@) right shifts, and right rotation (@ROR@).  All expect an operand and a shift amount on the stack, and operate on 32-bit words.
inst OR 0 B.i      { $1.i || $2.i }            { ibinop(OR); }
;@EQ@, @NEQ@, @LT@, @GT@, @LEQ@, @GEQ@: Integer comparisons, popping two integer arguments and pushing a Boolean result, either 0 or 1.
inst NOT 0 M.i      { ! $1.i }                  { push_con(1); ibinop(XOR); }
inst INC 0 M.i      { $1.i + 1 }                {= PUSH 1, PLUS }
inst DEC 0 M.i      { $1.i - 1 }                {= PUSH 1, MINUS }
inst BITAND 0 B.i  { $1.i & $2.i }            { ibinop(AND); }
inst BITOR 0 B.i    { $1.i | $2.i }            { ibinop(OR); }
inst BITXOR 0 B.i  { $1.i ^ $2.i }            { ibinop(XOR); }
inst BITNOT 0 M.i  { ~ $1.i }                  { imonop(NOT); }
 
inst LSL 0 B.i      { $1.i << $2.i }            { lsl(); }
inst LSR 0 B.i      { rshu($1.i, $2.i) }        { ibinop(RSHu); }
inst ASR 0 B.i      { $1.i >> $2.i }            { ibinop(RSH); }
inst ROR 0 B.i      { ror($1.i, $2.i) }        { ibinop(ROR); }
 
inst DIV 0 T2      { int_div(sp); }            { callout(int_div,2,INT,1); }
inst MOD 0 T2      { int_mod(sp); }            { callout(int_mod,2,INT,1); }
 
inst EQ 0 B.i      { $1.i == $2.i }            { compare(EQ); }
inst LT 0 B.i      { $1.i < $2.i }            { compare(LT); }
inst GT 0 B.i      { $1.i > $2.i }            { compare(GT); }
inst LEQ 0 B.i      { $1.i <= $2.i }            { compare(LE); }
inst GEQ 0 B.i      { $1.i >= $2.i }            { compare(GE); }
inst NEQ 0 B.i      { $1.i != $2.i }            { compare(NE); }


==Miscellaneous operations==
==Miscellaneous operations==
  INCL DECL DUP SWAP POP
;@INCL n@: Increment a local variable, equivalent to @LDLW n; CONST 1; PLUS; STLW n@. The offset @n@ must fit in two bytes.
;@DECL n@: The same as @INCL@, but decrementing instead of incrementing.
;@DUP k@: Push on the top of the stack a copy of the (single-word) value that is @k@ items from the top, where @k@ is 0, 1, or 2.
;@SWAP@: Swap the top two (single-word) values on the stack.
;@POP n@: Pop @n@ items from the stack, where @n@ < 256.


==Conditional and unconditional branches==
==Conditional and unconditional branches==
;@JEQ lab@, @JNEQ lab@, @JLT lab@, @JGT lab@, @JLEQ lab@, @JGEQ lab@: Expect two integers on the stack; pop and compare them, and branch to @lab@ if the relevant condition is satisfied.
;@JEQZ lab@, @JNEQZ lab@, @JLTZ lab@, @JGTZ lab@, @JLEQZ lab@, @JGEQZ lab@: Expect an integer on the stack; pop it and compare it with zero, and branch to @lab@ if the relevant condition is satisfied.
;@JUMP lab@: Jump to @lab@.


inst JEQ {S R} S2  { condj($1.i==$2.i, $a); } { condj(&j_eq, arg1); }
The next few instructions are intended for implementing @case@ statementsThe @TESTGEQ@ instruction can be used to build a binary tree of comparisons involving a value @k@ that remains on the stackAt the leaves of the tree, @JCASE@ and @JRANGE@ instructions permit the relevant case to be identified quickly.
inst JLT {S R} S2  { condj($1.i<$2.i, $a); }  { condj(&j_lt, arg1); }
;@JCASE n; CASEL lab1; ...; CASEL labn@: A @JCASE n@ instruction must be followed by a table of @n@ labels written as operands of @CASEL@. The instruction expects an integer @k@ on the stack; it pops the integer, and if 0 <= @k@ < @n@ branches to the corresponding case label; otherwise execution continues with the next instruction.
inst JGT {S R} S2  { condj($1.i>$2.i, $a); }  { condj(&j_gt, arg1); }
;@JRANGE lab@: Expect three integers @k@, @lo@ and @hi@ on the stack; pop them and branch to @lab@ if @lo@ <= k <= @hi@.
inst JLEQ {S R} S2  { condj($1.i<=$2.i, $a); } { condj(&j_le, arg1); }
;@TESTGEQ lab@: Expect two integers @k@ and @x@ on the stack; pop @x@ but leave @k@ on the stack, branching to @lab@ of @k@ >= @x@.
inst JGEQ {S R} S2  { condj($1.i>=$2.i, $a); }  { condj(&j_ge, arg1); }
inst JNEQ {S R} S2  { condj($1.i!=$2.i, $a); }  { condj(&j_ne, arg1); }
 
inst JNEQZ {S R} S1 { condj($1.i != 0, $a); }  {= PUSH 0, JNEQ $a }
inst JEQZ {S R} S1  { condj($1.i == 0, $a); }  {= PUSH 0, JEQ $a }
inst JLTZ S S1      { condj($1.i < 0, $a); }    {= PUSH 0, JLT $a }
equiv JLTZ R        { PUSH 0, JLT $a }
inst JGTZ S S1      { condj($1.i > 0, $a); }    {= PUSH 0, JGT $a }
equiv JGTZ R        { PUSH 0, JGT $a }
inst JLEQZ S S1    { condj($1.i <= 0, $a); }  {= PUSH 0, JLEQ $a }
equiv JLEQZ R      { PUSH 0, JLEQ $a }
inst JGEQZ S S1    { condj($1.i >= 0, $a); }  {= PUSH 0, JGEQ $a }
equiv JGEQZ R      { PUSH 0, JGEQ $a }
 
inst JUMP {S R} S0  { jump($a); }              { jump(arg1); }
 
# CASE STATEMENTS
 
inst JCASE 1 S1 {
    if ((unsigned) $1.i < (unsigned) $a)
          pc0 = pc + 2*$1.i, jump(get2(pc0)); else pc += 2*$a;
}
zinst CASEL R
 
inst JRANGE {S R} S3 {
    if ($1.i >= $2.i && $1.i <= $3.i) jump($a);
}
 
# The "T2" means take two arguments, but leave one of them on the stack
inst TESTGEQ {S R} T2 {
    if ($1.i >= $2.i) jump($a);
}
 
 
 
 


==Long integer operations==
==Long integer operations==
 
;@QPLUS@, @QMINUS@, @QTIMES@, @QDIV@, @QMOD@: Binary operations on long integers.
inst QPLUS 0 B.q    { $1.q + $2.q }            { qbinop(ADDq); }
;@QUMINUS@: Unary minus on long integers.
inst QMINUS 0 B.q  { $1.q - $2.q }            { qbinop(SUBq); }
;@QINC@, @QDEC@: Increment and decrement long integers.
inst QTIMES 0 B.q  { $1.q * $2.q }            { qbinop(MULq); }
;@QEQ@, @QNEQ@, @QLT@, @QGT@, @QLEQ@, @QGEQ@: Comparisons on long integers.
inst QUMINUS 0 M.q  { - $1.q }                  { qmonop(NEGq); }
;@QJEQ@, @QJNEQ@, @QJLT@, @QJGT@, @QJLEQ@, @QJGEQ@: Conditional branches on long integers.
inst QDIV 0 T2q    { long_div(sp); }          { callout(long_div,2,INT,2); }
inst QMOD 0 T2q    { long_mod(sp); }          { callout(long_mod,2,INT,2); }
 
equiv QINC 0        { PUSH 1, CONVNQ, QPLUS }
equiv QDEC 0        { PUSH 1, CONVNQ, QMINUS }
 
inst QCMP 0 B.iqq  { lcmp($1.q, $2.q) }
 
equiv QEQ 0        { QCMP, PUSH 0, EQ }
equiv QLT 0        { QCMP, PUSH 0, LT }
equiv QGT 0        { QCMP, PUSH 0, GT }
equiv QLEQ 0        { QCMP, PUSH 0, LEQ }
equiv QGEQ 0        { QCMP, PUSH 0, GEQ }
equiv QNEQ 0        { QCMP, PUSH 0, NEQ }
 
equiv QJEQ R        { QCMP, JEQZ $a }
equiv QJLT R        { QCMP, JLTZ $a }
equiv QJGT R        { QCMP, JGTZ $a }
equiv QJLEQ R      { QCMP, JLEQZ $a }
equiv QJGEQ R      { QCMP, JGEQZ $a }
equiv QJNEQ R      { QCMP, JNEQZ $a }


==Floating point arithmetic==
==Floating point arithmetic==
 
;@FPLUS@, @FMINUS@, @FTIMES@, @FDIV@: Binary arithmetic operations on floats.
inst FPLUS 0 B.f    { $1.f + $2.f }            { fbinop(ADDf); }
;@FUMINUS@: Unary minus on floats.
inst FMINUS 0 B.f  { $1.f - $2.f }            { fbinop(SUBf); }
;@DPLUS@, @DMINUS@, @DTIMES@, @DDIV@: Binary arithmetic operations on doubles.
inst FTIMES 0 B.f  { $1.f * $2.f }            { fbinop(MULf); }
;@DUMINUS@: Unary minus on doubles.
inst FDIV 0 B.f    { $1.f / $2.f }            { fbinop(DIVf); }
;@FEQ@, @FNEQ@, @FLT@, @FGT@, @FLEQ@, @FGEQ@: Comparisons on floats.
inst FUMINUS 0 M.f  { - $1.f }                  { fmonop(NEGf); }
;@DEQ@, @DNEQ@, @DLT@, @DGT@, @DLEQ@, @FGEQ@: Comparisons on doubles.
 
inst DPLUS 0 B.d    { $1.d + $2.d }            { dbinop(ADDd); }
inst DMINUS 0 B.d  { $1.d - $2.d }            { dbinop(SUBd); }
inst DTIMES 0 B.d  { $1.d * $2.d }            { dbinop(MULd); }
inst DDIV 0 B.d    { $1.d / $2.d }            { dbinop(DIVd); }
inst DUMINUS 0 M.d  { - $1.d }                  { dmonop(NEGd); }
 
equiv FEQ 0        { FCMPL, PUSH 0, EQ }
equiv FNEQ 0        { FCMPL, PUSH 0, NEQ }
equiv FLT 0        { FCMPG, PUSH 0, LT }
equiv FGT 0        { FCMPL, PUSH 0, GT }
equiv FLEQ 0        { FCMPG, PUSH 0, LEQ }
equiv FGEQ 0        { FCMPL, PUSH 0, GEQ }
 
equiv DEQ 0        { DCMPL, PUSH 0, EQ }
equiv DNEQ 0        { DCMPL, PUSH 0, NEQ }
equiv DLT 0        { DCMPG, PUSH 0, LT }
equiv DGT 0        { DCMPL, PUSH 0, GT }
equiv DLEQ 0        { DCMPG, PUSH 0, LEQ }
equiv DGEQ 0        { DCMPL, PUSH 0, GEQ }


==Floating point conditional branches==
==Floating point conditional branches==
Line 325: Line 213:
==More bits and pieces==
==More bits and pieces==


# ALIGNs instructions are used on big-endian machines like the SPARC
;@ALIGNC@, @ALIGNS@: Expect a one-byte or two-byte quantity on top of the stack.  Adjust its alignement in way that is appropriate if it is to become a procedure parameter.  These operations are no-ops on little-endian architectures like x86 and ARM, but act as shifts on big-endian architectures.
# to ensure that CHAR and SHORT parameters appear at the right address.
;@FIXCOPY@: Expect two pointers @dst@ and @src@ and a count @n@ on the stack; pop them and copy @n@ bytes from @src@ to @dst@.
inst ALIGNC 0 M.i  { alignx($1.i, 8) }
;@FLEXCOPY@: Expect on the stack a pointer to the location in the stack frame where the address of a flexible array parameter is stored, and an integer giving its size in bytes; pop them, allocate space for the parameter in the stack frame of the current procedure, copy the data across, and replace the parameter address with the address of the copy.
inst ALIGNS 0 M.i  { alignx($1.i, 16) }
;@LNUM n@: Note the beginning of the code for source line @n@. This instruction has no effect normally, but can be used for line-count profiling and to support breakpoints in a debugger.  The value @n@ must fit in 2 bytes.
 
# FIXCOPY copies a fixed number of bytes; it is used for structure
# assignment and also for value parameters of (fixed) array or record type. */
inst FIXCOPY 0 S3 {
    prof_charge($3.i/4);
    memcpy(pointer($1), pointer($2), $3.i);
}
 
# FLEXCOPY expects to find on the stack the address of a flex array parameter  
# and a size in bytes; it copies the parameter to dynamic local space,  
# then overwrites the parameter with the new address.
inst FLEXCOPY 0 S0 {
    value *d = pointer(sp[1]); int size = sp[0].i;         
    int sizew = (size+3)/4; prof_charge(sizew);                       
    sp -= sizew - 2;                                                 
    if ((uchar *) sp < stack + SLIMIT) error(E_STACK, 0);             
    memcpy(sp, pointer(d[0]), size);                                 
    d[0].a = stkaddr(sp);
}
 
inst LNUM 2 S0 {
#ifdef PROFILE
    if (lflag) {
          static module m = NULL; /* Cache most recent module */
          ticks--;
          if (m == NULL || dsegaddr(cp) < m->m_addr
              || dsegaddr(cp) >= m->m_addr + m->m_length) {
              m = find_module(dsegaddr(cp));
          }
          m->m_lcount[$a-1]++;
      }
#endif
#ifdef OBXDEB
      if (intflag)
          breakpoint(cp, bp, pc0, "interrupt");
      else if (one_shot)
          breakpoint(cp, bp, pc0, "line");
#endif
}


==Procedure call==
==Procedure call==
The procedure call instructions provided by the standard Keiko machine are slightly different from the @PCALL@ instruction used in the Compilers course: in fact @PCALL n@ is equivalent to @CALL (n+1)@, because the always-present static link is treated as an extra parameter.  Other compilers targetting Keiko do not pass the static link as a parameter, but instead via a special 'secret' place.  This makes it possible to global procedures to ignore any dummy static link they may be passed, and for calls to known global procedures to avoid passing a static link at all, saving time and space.


# In the interpreter, the CALLW and CALLD instructions are implemented as
;@CALL n@, @CALLW n@, @CALLF n@, @CALLD n@, @CALLQ n@: Call a procedure with @n@ arguments and no result (@CALL@), a one-word result that may be an integer or a pointer (@CALLW@), a single-precision (@CALLF@) or double-precision (@CALLD@) floating point result, or a 64-bit integer result (@CALLQ@).  The arguments should previusly have been pushed on the evaluation stack, with double-precision float and 64-bit integer arguments counting double, and followed by the procedure addressThese arguments become part of the stack frame of the procedure, and the procedure address and the arguments are popped when the procedure returns.
# two operations, an ordinary CALL followed by a SLIDE; the return address
;@STATLINK@: Expect a pointer to a frame base on the stack; pop it and save it in a secret placeThe @STATLINK@ instruction should appear just before the code that pushes the procedure address for the call (and that code should not itself involve other procedure calls).
# of the called routine points to the SLIDE instruction, which is
;@SAVELINK@: This instruction must be the first in a procedure that expects a static link.  It moves the link from the secret place to it proper location in the stack frame of the procedure.
# responsible for copying the result.
equiv CALL 1        { JPROC, SLIDE $a }
equiv CALLW 1      { JPROC, SLIDEW $a }
equiv CALLQ 1      { JPROC, SLIDEQ $a }
equiv CALLF 1      { JPROC, SLIDEF $a }
equiv CALLD 1      { JPROC, SLIDED $a }
 
# STATLINK saves a static link in a 'secret place' just before a proedure
# call.  The secret place is chosen to be the location where the called
# procedure will store its static link, which can be computed as a negative
# offset from the stack pointerThe fixed offset means that STATLINK
# will work properly only as part of the calling sequence
#    <static link> / STATLINK / <load proc addr> / [STKMAP] / CALL
# and not (e.g.) if the static link is saved *before* the parameters
# are put on the stack, or if <load proc addr> requires a stack depth
# of more than 3.
#
#        NOW          LATER
#
#        param 2      param 2
#        param 1      param 1
#    sp: stat link    proc addr |
#                      ret addr | HEAD
#                  bp: dyn link  V
#                      stat link
#
inst STATLINK 0 S0 \
    { sp[1-HEAD+SL].a = sp[0].a; sp++; } { statlink(); }
 
# SAVELINK moves the static link from the 'secret place' to its proper
# location in the frame of the called procedure. Because the secret
# place is actually the proper location already, all that is needed is
# to protect the static link when the frame is cleared (see the call
# to memset in interp.c).
inst SAVELINK 0 S0 { } { }
 
inst RETURN 0 S0 {
    if (bp == base) {
          level--;
#ifdef PROFILE
          prof_exit(0, ticks);
#endif
          return sp;
    }


    rp = sp; sp = bp; pc = codeptr(sp[PC].a);
;@RETURN@
    bp = valptr(sp[BP]); cp = valptr(bp[CP]);
    do_find_proc;
#ifdef PROFILE
    prof_exit(dsegaddr(cp), ticks);
    ticks = 0;
#endif
}

Latest revision as of 10:20, 7 June 2023

Syntax

This section gives the syntax of Keiko assembly language programs in the form that is accepted by the bytecode assembler/linker oblink. The style of syntax description is similar to that used in the Kernighan & Ritchie book on C: a syntactic category is followed by a sequence of alternatives, each on a separate line. A subscript opt indicates that a construct is optional.

Lexical conventions

  • Each element on its own line (nl used below to denote a line boundary)
  • Blank lines and lines beginning # are ignored
  • Identifiers can be any sequence of non-blank characters, including e.g. Files.Read. It's wise to avoid indentifoers that begin with a digit or minus sign, as in some contexts these may be interpreted as numeric constants.

The operands of instructions are described as instances of the class constant. In most contexts, constants may be specified in decimal or hexadecimal (as in 0x1234abcd), or may be symbols defined elsewhere in the program.

constant:
    decimal-constant
    hexadecimal-constant
    ident

Files

A Keiko file contains a heading that gives the name of the module and lists (in IMPORT directives) other modules that it depends upon. A compiler that outputs Keiko code can generate a checksum for the public interface of a module and embed this checksum in each other module that uses it, and the assembler/linker will then check across all modules in a program that the checksums are consistent. Unused checksums can be replaced by 0. The module header also contains a count of source lines in the module that is used to allocate counters for line-count profiling; this too can be replaced by 0 of profiling is not going to be used on the program.

file:
    heading bodyopt
heading:
    module-directive importsopt endhdr-directive
module-directive:
    MODULE ident checksum linecnt nl
imports:
    import-directive
    import-directive imports
import-directive:
    IMPORT ident checksum nl
endhdr-directive:
    ENDHDR nl

The body of a module constists of multi-line procedures interspersed with other single-line directives that (among other things) allocate global storage.

body:
    phrase
    phrase body
phrase:
    directive
    procedure

Directives

Directives appear between the procedures of a program.

directive:
    DEFINE ident nl
    WORD constant nl
    LONG constant nl
    FLOAT float nl
    DOUBLE float nl
    STRING hex-string nl
    GLOVAR ident integer nl
    PRIMDEF ident ident type-string nl
  • a DEFINE directive defines a symbol at the current location in the data segment. That location is the address of any following data item created with another directive such as WORD, FLOAT or STRING.
  • the WORD, LONG, FLOAT and DOUBLE directives each contribute a numeric constant to the data segment, allowing global data tables to be initialised; the table can be accessed through a label defined by a preceding DEFINE directive.
  • a STRING directive contributes a sequence of characters, specified by a hexadecimal string, to the data segment. If a terminating null character is needed, then this should be included in the hex string. The length of the string is padded to a multiple of 4 bytes. When convenient, it is possible to build up a string in several parts by giving multiple successive STRING directives, provided the length of all but the last directive is a multiple of 4 to prevent padding.
  • a GLOVAR directive allocates space of a specified size in the bss segment, and defines a symbol with its address. The size is rounded up to a multiple of 4, so that the current location in the bss segment is always aligned.
  • a PRIMDEF directive declares a named primitive whose definition is a C subroutine. A directive such as PRIMDEF Math.sqrt sqrtf FF declares a primitive that will be named Math.sqrt in the Keiko program, and interfaces to the standard C library function sqrtf, which the type string FF describes as taking a single float argument and yielding a float result. Some implementations of Keiko are able to link dynamically to libaries containing C functions, and others require an interpreter containing the primitives to be compiled specially.

Procedures

Each procedure has a heading that gives its name and some other information. This is followed by a sequence of mingled Keiko machine instructions and pseudo-operations. The pseudo-operations typically assemble into an entry in the procedure's constant pool, together with an instruction that loads the constant onto the stack.

procedure:
    proc-directive bodyopt end-directive
proc-directive:
    PROC ident integer integer constant nl
body:
    element
    element body
end-directive:
    END nl
element:
    pseudo-operation
    instruction
  • A PROC directive begins a procedure. The three arguments are:
    • The size of the procedure's local variable space in bytes; this should be a multiple of 4.
    • The maximum number of values pushed on the evaluation stack during the procedure, counting most types as one value, but long integers and doubles as two values. This argument is not currently used by implementations of the Keiko machine, and can be repaced by zero; the only possible disadvatage in future Keiko implementations is that stack overflow not be detected promptly. At present, the stack overflow check leaves a generous margin of space for each procedure to use.
    • A garbage collector map for the stack frame. If the Kieko machine is built without the optional garbage collector, or if the stack frame of the procedure contains no pointers into the heap, then this argument can be zero. If garbage collection is enabled, then every procedure that stores pointers in its frame must have a garbage collector map, which will be either a bitmap expressed as a hexdecimal constant, or the address of a program written in a special mini-language that describes the layout of the frame. This mini-language is described elsewhere.
  • An END directive ends a procedure and can be followed by further material that appears between procedures.

Instructions

instruction:
    opcode operandsopt nl
operands:
    constant
    constant operands

Each instruction has an optional list of operands, which (depending on the instruction) can be integer constants, assembler symbols, and labels. Details of the instructions and what operands they take appear later in this document.

Pseudo-operations

These pseudo-operations should appear inside a Keiko procedure; most behave like intructions but also contribute additional information to the current procedure.

pseudo-operation:
    LABEL ident nl
    CONST constant nl
    GLOBAL ident nl
    FCONST float nl
    DCONST float nl
    QCONST constant nl
    STKMAP constant nl
    LINE integer nl
  • The LABEL pseudo-op defines its argument to as a label for the next instruction in the procedure. Labels can be arbitrary identifiers and have a scope that is the whole of the current procedure. They are used only in branch instructions, and do not have a value that can be stored in a variables.
  • The next few pseudo-ops act as instructions that push a value on the stack, but are capable of handling 32-bit or 64-bit values that are stored out of line in the constant pool for the procedure. CONST pushes an arbitrary integer or address; GLOBAL is similar, but retricted to the addresses of globals; FCONST, DCONST and QCONST push float, double and long integer constants respectively, with the double and long integer constants taking up two stack slots. These pseudo-ops are typically translated by the Keiko assembler into LDKW or LDKD instructions that reference a slot it has allocated in the constant pool. As a special case CONST pseudo-ops that contain a small constant are translated into PUSH instructions that use an inline constant, either encoded directly in the opcode byte, or following it as the next one or two bytes of the instruction stream. All this is hidden from programmers and compilers by the Keiko assembler.
  • The STKMAP pseudo-op specifes a pointer map for the evaluation stack that holds at an immediately following CALL instruction. Any pointer values on the evaluation stack that are used as arguments to the procedure call will be covered by the procedure's own stack map, so this pseudo-op is needed only in the rare case where other values near the bottom of the evaluation stack will persist over the call. These stack maps are gathered for the whole procedure and used by the assembler to compile a stack map table that – alongside the code and the constant pool – forms part of the runtime representation of the procedure. If the Keiko machine is built without a garbage collector, then naturally enough these stack maps can be omitted.
  • The LINE pseudo-op marks a source line, with an argument that is the line number. It adds the line number to a table that the assembler includes with the object program, and also generates an LNUM instruction in the code. The LNUM instructions are used both by the Keiko profiler, which can count how many times each line is executed, and by debuggers, which can replace them with BREAK insructions to implement breakpoints.

Load and store instructions

These instructions are named according to a convention where the last letter of the mnemonic identifies the size and type of data: W for a 4-byte word, but also C for a byte, S for a 2-byte halfword, F for a single-precision floating point number, D for double-precision floating point, and Q for an 8-byte integer. Single bytes are treated as unsigned, but halfwords are sign-extended on loading, in agreement with the CHAR and SHORTINT types of the Oberon language for which Keiko was originally designed. The distinction between integer and floating-point values is made primarily to help very simple JIT transalators with register allocation.

LOCAL n
Push the address of a local at a constant offset (positive or negative) from the frame pointer.
OFFSET
Expect an address and an integer offset on the stack; pop them, add them, and push the result.
INDEXS, INDEXW, INDEXD
Similar to OFFSET, except that the integer offset if multiplied by 2, 4, or 8 respectively before the addition.
LOADW
Expect an address on the stack; pop it, and push the 4-byte contents of the address.
LOADS, LOADC, LOADF
Like LOADW, except that the value loaded is a 2-byte signed integer (LOADS), a single unsigned byte (LOADC), or a single-precision float (LOADF).
LOADD, LOADQ
Like LOADW, except that the value loaded is a double-precision float (LOADD) of an 8-byte integer (LOADQ). Each of these types takes up two slots on the evaluation stack. The two halves of each value are loaded separately, so that the values in memory need only 4-byte alignment.
STOREW
Expect a 4-byte integer value and an address on the stack; pop them, and store the value at the address.
STORES, STOREC, STOREF
Like STOREW, except the value stored is a 2-byte integer (STORES), a single byte (STOREC) or a single-precision float (STOREF).
STORED, STOREQ
Like STOREW, except that the value stored is a double-precision float or an 8-byte integer, each of which occupies two slots on the evaluation stack. Again, only 4-byte alignment of the target address is required.

The addressing operators and load/store instructions listed above form a complete set, and the remaining instructions listed below are just shorthand for combinations of them. Compilers may combine the basic instructions into these shorthands, and implementations of Keiko may choose to implement some of the combinations directly, reducing the size of the binary code and speeding up the bytecode interpreter, which can achieve more in each cycle. Some of the rarer combinations may not in fact be implemented with their own bytecodes; for them, the assembler partially or completely expands the shorthands into their underlying primitive instructions. For example, the usual implementation of Keiko implements LDGF directly, in a single instruction that occupies 2 or 3 bytes, but re-expands LDGD x into the equivalent sequence GLOBAL x; LOADD, partly because the operation LOADD is already quite expensive, so the expense of forming the address in a separate instruction is proportionally less significant.

LDLW n, LDLS , LDLC n, LDLF n, LDLD n, LDLQ n
Load local, equivalent to LOCAL n followed by LOADW, LOADS, etc. Note that the signed offset n must fit into 2 bytes; otherwise the compiler generating the Keiko code must explicitly use the equivalent sequence LOCAL 0; CONST n; OFFSET; LOADW, etc.
STLW n, STLS n, STLC n, STLF n, STLD n, STLQ n
Store local, equivalent to LOCAL n; STOREW, etc. Again, the signed offset n must fit in 2 bytes.
LDGW x, LDGS x, LDGC x, LDGF x, LDGD x, LDGQ x
Load global, equivalent to GLOBAL x; LOADW, etc.
STGW x, STGS x, STGC x, STGF x, STGD x, STGQ x
Store global, equivalent to GLOBAL x; STOREW, etc.
LDNW n, LDNS n, LDNC n, LDNF n, NDND n, LDNQ n
Load indexed, equivalent to CONST n; OFFSET; LOADW, etc., with signed offset n fitting in 2 bytes.
STNW n, STNS n, STNC n, STNF n, STND n, STNQ n
Store indexed, equivalent to CONST n; OFFSET; STOREW, etc.,, with signed offset n fitting in 2 bytes.
LDIW, LDIS, LDIC, LDIF, LDID, LDIQ
Double-indexed load, equivalent to CONST 4; TIMES; OFFSET; LOADW, etc., with a scale factor equal to the size of the value loaded.
STIW, STIS, STIC, STIF, STID, STIQ
Double-indexed store, equivalent to CONST 4; TIMES; OFFSET; STOREW, etc., with a scale factor equal to the size of the value being stored.

Integer arithmetic

PLUS, MINUS, TIMES
Pop two integers from the stack, combine them with an arithmetic operation, and push the result.
UMINUS
Unary minus; pop an integer and push the integer with the same magnitude and opposite sign.
DIV, MOD
Integer division and modulo, defined with truncation towards minus infinity.
INC, DEC
Integer increment and decrement, equivalent to CONST 1; PLUS or CONST 1; MINUS.
AND, OR, NOT
Boolean operations; the integer arguments are interpreted as false if zero, true if non-zero, and the result is either 0 or 1.
BITAND, BITOR, BITXOR, BITNOT
Bitwise logical operations.
LSL, LSR, ASR, ROR
Shifts and rotations, with left shift (LSL), both logical (LSR) and arithmetic (ASR) right shifts, and right rotation (ROR). All expect an operand and a shift amount on the stack, and operate on 32-bit words.
EQ, NEQ, LT, GT, LEQ, GEQ
Integer comparisons, popping two integer arguments and pushing a Boolean result, either 0 or 1.

Miscellaneous operations

INCL n
Increment a local variable, equivalent to LDLW n; CONST 1; PLUS; STLW n. The offset n must fit in two bytes.
DECL n
The same as INCL, but decrementing instead of incrementing.
DUP k
Push on the top of the stack a copy of the (single-word) value that is k items from the top, where k is 0, 1, or 2.
SWAP
Swap the top two (single-word) values on the stack.
POP n
Pop n items from the stack, where n < 256.

Conditional and unconditional branches

JEQ lab, JNEQ lab, JLT lab, JGT lab, JLEQ lab, JGEQ lab
Expect two integers on the stack; pop and compare them, and branch to lab if the relevant condition is satisfied.
JEQZ lab, JNEQZ lab, JLTZ lab, JGTZ lab, JLEQZ lab, JGEQZ lab
Expect an integer on the stack; pop it and compare it with zero, and branch to lab if the relevant condition is satisfied.
JUMP lab
Jump to lab.

The next few instructions are intended for implementing case statements. The TESTGEQ instruction can be used to build a binary tree of comparisons involving a value k that remains on the stack. At the leaves of the tree, JCASE and JRANGE instructions permit the relevant case to be identified quickly.

JCASE n; CASEL lab1; ...; CASEL labn
A JCASE n instruction must be followed by a table of n labels written as operands of CASEL. The instruction expects an integer k on the stack; it pops the integer, and if 0 <= k < n branches to the corresponding case label; otherwise execution continues with the next instruction.
JRANGE lab
Expect three integers k, lo and hi on the stack; pop them and branch to lab if lo <= k <= hi.
TESTGEQ lab
Expect two integers k and x on the stack; pop x but leave k on the stack, branching to lab of k >= x.

Long integer operations

QPLUS, QMINUS, QTIMES, QDIV, QMOD
Binary operations on long integers.
QUMINUS
Unary minus on long integers.
QINC, QDEC
Increment and decrement long integers.
QEQ, QNEQ, QLT, QGT, QLEQ, QGEQ
Comparisons on long integers.
QJEQ, QJNEQ, QJLT, QJGT, QJLEQ, QJGEQ
Conditional branches on long integers.

Floating point arithmetic

FPLUS, FMINUS, FTIMES, FDIV
Binary arithmetic operations on floats.
FUMINUS
Unary minus on floats.
DPLUS, DMINUS, DTIMES, DDIV
Binary arithmetic operations on doubles.
DUMINUS
Unary minus on doubles.
FEQ, FNEQ, FLT, FGT, FLEQ, FGEQ
Comparisons on floats.
DEQ, DNEQ, DLT, DGT, DLEQ, FGEQ
Comparisons on doubles.

Floating point conditional branches

There are ten different conditional branches involving comparison of two single-precision floats (and another ten for double precision), because the treatment of anomalous NaN values means the jump-if-less-than-or-equal is different from jump-if-not-greater-than. The standard implementation of Keiko, like the JVM, implements these using one of two floating-point comparison instructions (with different treatment of NaN) followed by an integer conditional branch.

FJEQ lab, FJNEQ lab, FJLT lab, FJGT lab, FJLEQ lab, FJGEQ lab, FJNLT lab, FJNGT lab, FJNLEQ lab, FJNGEQ lab
Expect two single-precision floats on the stack; pop and compare them, and jump to lab if the condition is satisfied.
DJEQ lab, DJNEQ lab, DJLT lab, DJGT lab, DJLEQ lab, DJGEQ lab, DJNLT lab, DJNGT lab, DJNLEQ lab, DJNGEQ lab
Similar instructions, but with double-precision operands.

Conversions

A small but sufficient set of conversions is provided; other conversions must go via the integer type.

CONVNF
Expect an integer on the stack; pop it, convert it to a single-precision floating point approximation to the same value, and push the result.
CONVND
Convert an integer to double-precision floating point.
CONVFN
Convert a single-precision float to an integer, discarding the fractional part.
CONVDN
Convert a double-precision float to an integer, discarding the fractional part.
CONVFD
Convert from single-precision to double-precision floating point.
CONVDF
Convert from double-precision to single-precision floating point.
CONVNC
Convert an integer to an unsigned character by masking off all but the bottom 8 bits.
CONVNS
Convert an integer to a signed 2-byte integer by masking off all but the bottom 16 bits, then sign extending from 16 to 32 bits.
CONVNQ
Convert from a 32-bit to a 64-bit integer with sign extension.
CONVQN
Convert from a 64-bit to a 32-bit integer.
CONVQD
Convert a 64-bit integer to double-precision floating point.
CONVDQ
Convert a double-precision float to a 64-bit integer.

Runtime checks

BOUND line
Expect integers index and bound on the stack; pop bound, but leave index on the stack. If the relationship 0 <= index < bound is not satisfied, signal an array bound error on line.
NCHECK line
Expect a pointer on the stack; leave it there, but if it is null, signal a null pointer error on line.
GHCECK line
Expect a pointer on the stack; leave it there, but if it is non-null, signal an error involving the assignment of a local procedure to a procedure-valued variable.
ZCHECK line, FZCHECK line, DZCHECK line, QZCHECK line
Expect an integer or a single- or double-precision float or a long integer on the stack; leave it there, but i it is zero, signal a divide-by-zero error on line.
ERROR n line
signal error n on line. Predefined constants such as E_CAST allow the message to be chosen from a standard list: see the function message in source file xmain.c.

More bits and pieces

ALIGNC, ALIGNS
Expect a one-byte or two-byte quantity on top of the stack. Adjust its alignement in way that is appropriate if it is to become a procedure parameter. These operations are no-ops on little-endian architectures like x86 and ARM, but act as shifts on big-endian architectures.
FIXCOPY
Expect two pointers dst and src and a count n on the stack; pop them and copy n bytes from src to dst.
FLEXCOPY
Expect on the stack a pointer to the location in the stack frame where the address of a flexible array parameter is stored, and an integer giving its size in bytes; pop them, allocate space for the parameter in the stack frame of the current procedure, copy the data across, and replace the parameter address with the address of the copy.
LNUM n
Note the beginning of the code for source line n. This instruction has no effect normally, but can be used for line-count profiling and to support breakpoints in a debugger. The value n must fit in 2 bytes.

Procedure call

The procedure call instructions provided by the standard Keiko machine are slightly different from the PCALL instruction used in the Compilers course: in fact PCALL n is equivalent to CALL (n+1), because the always-present static link is treated as an extra parameter. Other compilers targetting Keiko do not pass the static link as a parameter, but instead via a special 'secret' place. This makes it possible to global procedures to ignore any dummy static link they may be passed, and for calls to known global procedures to avoid passing a static link at all, saving time and space.

CALL n, CALLW n, CALLF n, CALLD n, CALLQ n
Call a procedure with n arguments and no result (CALL), a one-word result that may be an integer or a pointer (CALLW), a single-precision (CALLF) or double-precision (CALLD) floating point result, or a 64-bit integer result (CALLQ). The arguments should previusly have been pushed on the evaluation stack, with double-precision float and 64-bit integer arguments counting double, and followed by the procedure address. These arguments become part of the stack frame of the procedure, and the procedure address and the arguments are popped when the procedure returns.
STATLINK
Expect a pointer to a frame base on the stack; pop it and save it in a secret place. The STATLINK instruction should appear just before the code that pushes the procedure address for the call (and that code should not itself involve other procedure calls).
SAVELINK
This instruction must be the first in a procedure that expects a static link. It moves the link from the secret place to it proper location in the stack frame of the procedure.
RETURN