[Template fetch failed for http://spivey.oriel.ox.ac.uk/corner/Template:Sitenotice?action=render: HTTP 404]

Keiko assembly language

From Compilers
Jump to navigation Jump to search

Syntax

This section gives the syntax of Keiko assembly language programs in the form that is accepted by the bytecode assembler/linker oblink. The style of syntax description is similar to that used in the Kernighan & Ritchie book on C: a syntactic category is followed by a sequence of alternatives, each on a separate line. A subscript opt indicates that a construct is optional.

Lexical conventions

  • Each element on its own line (nl used below to denote a line boundary)
  • Blank lines and lines beginning # are ignored
  • Identifiers can be any sequence of non-blank characters, including e.g. Files.Read. It's wise to avoid indentifoers that begin with a digit or minus sign, as in some contexts these may be interpreted as numeric constants.

Files

A Keiko file contains a heading that gives the name of the module and lists (in IMPORT directives) other modules that it depends upon. A compiler that outputs Keiko code can generate a checksum for the public interface of a module and embed this checksum in each other module that uses it, and the assembler/linker will then check across all modules in a program that the checksums are consistent. Unused checksums can be replaced by 0. The module header also contains a count of source lines in the module that is used to allocate counters for line-count profiling; this too can be replaced by 0 of profiling is not going to be used on the program.

file:
    heading bodyopt
heading:
    module-directive importsopt endhdr-directive
module-directive:
    MODULE ident checksum linecnt nl
imports:
    import-directive
    import-directive imports
import-directive:
    IMPORT ident checksum nl
endhdr-directive:
    ENDHDR nl

The body of a module constists of multi-line procedures interspersed with other single-line directives that (among other things) allocate global storage.

body:
    phrase
    phrase body
phrase:
    directive
    procedure

Directives

Directives appear between the procedures of a program.

directive:
    DEFINE ident nl
    WORD constant nl
    LONG constant nl
    FLOAT float nl
    DOUBLE float nl
    STRING hex-string nl
    GLOVAR ident integer nl
    PRIMDEF ident ident type-string nl
  • a DEFINE directive defines a symbol at the current location in the data segment. That location is the address of any following data item created with another directive such as WORD, FLOAT or STRING.
  • the WORD, LONG, FLOAT and DOUBLE directives each contribute a numeric constant to the data segment, allowing global data tables to be initialised; the table can be accessed through a label defined by a preceding DEFINE directive.
  • a STRING directive contributes a sequence of characters, specified by a hexadecimal string, to the data segment. If a terminating null character is needed, then this should be included in the hex string. The length of the string is padded to a multiple of 4 bytes. When convenient, it is possible to build up a string in several parts by giving multiple successive STRING directives, provided the length of all but the last directive is a multiple of 4 to prevent padding.
  • a GLOVAR directive allocates space of a specified size in the bss segment, and defines a symbol with its address. The size is rounded up to a multiple of 4, so that the current location in the bss segment is always aligned.
  • a PRIMDEF directive declares a named primitive whose definition is a C subroutine. A directive such as PRIMDEF Math.sqrt sqrtf FF declares a primitive that will be named Math.sqrt in the Keiko program, and interfaces to the standard C library function sqrtf, which the type string FF describes as taking a single float argument and yielding a float result. Some implementations of Keiko are able to link dynamically to libaries containing C functions, and others require an interpreter containing the primitives to be compiled specially.

Procedures

Each procedure has a heading that gives its name and some other information. This is followed by a sequence of mingled Keiko machine instructions and pseudo-operations. The pseudo-operations typically assemble into an entry in the procedure's constant pool, together with an instruction that loads the constant onto the stack.

procedure:
    proc-directive bodyopt end-directive
proc-directive:
    PROC ident 0 0 0 nl
body:
    element
    element body
end-directive:
    END nl
element:
    pseudo-operation
    instruction
  • A PROC directive begins a procedure. The three arguments are:
    • The size of the procedure's local variable space in bytes; this should be a multiple of 4.
    • The maximum number of values pushed on the evaluation stack during the procedure, counting most types as one value, but long integers and doubles as two values. This argument is not currently used by implementations of the Keiko machine, and can be repaced by zero; the only possible disadvatage in future Keiko implementations is that stack overflow not be detected promptly. At present, the stack overflow check leaves a generous margin of space for each procedure to use.
    • A garbage collector map for the stack frame. If the Kieko machine is built without the optional garbage collector, or if the stack frame of the procedure contains no pointers into the heap, then this argument can be zero. If garbage collection is enabled, then every procedure that stores pointers in its frame must have a garbage collector map, which will be either a bitmap expressed as a hexdecimal constant, or the address of a program written in a special mini-language that describes the layout of the frame. This mini-language is described elsewhere.

Instructions

instruction:
    opcode operandsopt nl
operands:
    constant
    constant operands

Each instruction has an optional list of operands, which (depending on the instruction) can be integer constants, assembler symbols, and labels.

Pseudo-operations

These pseudo-operations should appear inside a Keiko procedure; most behave like intructions but also contribute additional information to the current procedure.

pseudo-operation:
    LABEL ident nl
    CONST constant nl
    GLOBAL ident nl
    FCONST float nl
    DCONST float nl
    QCONST constant nl
    STKMAP constant nl
    LINE integer nl
  • The LABEL pseudo-op defines its argument to as a label for the next instruction in the procedure. Labels can be arbitrary identifiers and have a scope that is the whole of the current procedure. They are used only in branch instructions, and do not have a value that can be stored in a variables.
  • The next few pseudo-ops act as instructions that push a value on the stack, but are capable of handling 32-bit or 64-bit values that are stored out of line in the constant pool for the procedure. CONST pushes an arbitrary integer or address; GLOBAL is similar, but retricted to the addresses of globals; FCONST, DCONST and QCONST push float, double and long integer constants respectively, with the double and long integer constants taking up two stack slots. These pseudo-ops are typically translated by the Keiko assembler into LDKW or LDKD instructions that reference a slot it has allocated in the constant pool. As a special case CONST pseudo-ops that contain a small constant are translated into PUSH instructions that use an inline constant, either encoded directly in the opcode byte, or following it as the next one or two bytes of the instruction stream. All this is hidden from programmers and compilers by the Keiko assembler.
  • The STKMAP pseudo-op specifes a pointer map for the evaluation stack that holds at an immediately following CALL instruction. Any pointer values on the evaluation stack that are used as arguments to the procedure call will be covered by the procedure's own stack map, so this pseudo-op is needed only in the rare case where other values near the bottom of the evaluation stack will persist over the call. These stack maps are gathered for the whole procedure and used by the assembler to compile a stack map table that – alongside the code and the constant pool – forms part of the runtime representation of the procedure. If the Keiko machine is built without a garbage collector, then naturally enough these stack maps can be omitted.
  • The LINE pseudo-op marks a source line, with an argument that is the line number. It adds the line number to a table that the assembler includes with the object program, and also generates an LNUM instruction in the code. The LNUM instructions are used both by the Keiko profiler, which can count how many times each line is executed, and by debuggers, which can replace them with BREAK insructions to implement breakpoints.