Note: I've just migrated to a different physical server to run Spivey's Corner,
with a new architecture, a new operating system, a new version of PHP, and an updated version of MediaWiki.
Please let me know if anything needs adjustment! – Mike

Keiko assembly language

From Compilers
Jump to navigation Jump to search

Syntax

This section gives the syntax of Keiko assembly language programs in the form that is accepted by the bytecode assembler/linker oblink. The style of syntax description is similar to that used in the Kernighan & Ritchie book on C: a syntactic category is followed by a sequence of alternatives, each on a separate line. A subscript opt indicates that a construct is optional.

Lexical conventions

  • Each element on its own line (nl used below to denote a line boundary)
  • Blank lines and lines beginning # are ignored
  • Identifiers can be any sequence of non-blank characters, including e.g. Files.Read. It's wise to avoid indentifoers that begin with a digit or minus sign, as in some contexts these may be interpreted as numeric constants.

Files

A Keiko file contains a heading that gives the name of the module and lists (in IMPORT directives) other modules that it depends upon. A compiler that outputs Keiko code can generate a checksum for the public interface of a module and embed this checksum in each other module that uses it, and the assembler/linker will then check across all modules in a program that the checksums are consistent. Unused checksums can be replaced by 0. The module header also contains a count of source lines in the module that is used to allocate counters for line-count profiling; this too can be replaced by 0 of profiling is not going to be used on the program.

file:
    heading bodyopt
heading:
    module-directive importsopt endhdr-directive
module-directive:
    MODULE ident checksum linecnt nl
imports:
    import-directive
    import-directive imports
import-directive:
    IMPORT ident checksum nl
endhdr-directive:
    ENDHDR nl

The body of a module constists of multi-line procedures interspersed with other single-line directives that (among other things) allocate global storage.

body:
    phrase
    phrase body
phrase:
    directive
    procedure

Directives

Directives appear between the procedures of a program.

directive:
    DEFINE ident nl
    WORD constant nl
    LONG constant nl
    FLOAT float nl
    DOUBLE float nl
    STRING hex-string nl
    GLOVAR ident integer nl
    PRIMDEF ident ident type-string nl
  • a DEFINE directive defines a symbol at the current location in the data segment. That location is the address of any following data item created with another directive such as WORD, FLOAT or STRING.
  • the WORD, LONG, FLOAT and DOUBLE directives each contribute a numeric constant to the data segment, allowing global data tables to be initialised; the table can be accessed through a label defined by a preceding DEFINE directive.
  • a STRING directive contributes a sequence of characters, specified by a hexadecimal string, to the data segment. If a terminating null character is needed, then this should be included in the hex string. The length of the string is padded to a multiple of 4 bytes. When convenient, it is possible to build up a string in several parts by giving multiple successive STRING directives, provided the length of all but the last directive is a multiple of 4 to prevent padding.
  • a GLOVAR directive allocates space of a specified size in the bss segment, and defines a symbol with its address. The size is rounded up to a multiple of 4, so that the current location in the bss segment is always aligned.
  • a PRIMDEF directive declares a named primitive whose definition is a C subroutine. A directive such as PRIMDEF Math.sqrt sqrtf FF declares a primitive that will be named Math.sqrt in the Keiko program, and interfaces to the standard C library function sqrtf, which the type string FF describes as taking a single float argument and yielding a float result. Some implementations of Keiko are able to link dynamically to libaries containing C functions, and others require an interpreter containing the primitives to be compiled specially.

Procedures

Each procedure has a heading that gives its name and some other information. This is followed by a sequence of mingled Keiko machine instructions and pseudo-operations. The pseudo-operations typically assemble into an entry in the procedure's constant pool, together with an instruction that loads the constant onto the stack.

procedure:
    proc-directive bodyopt end-directive
proc-directive:
    PROC ident 0 0 0 nl
body:
    element
    element body
end-directive:
    END nl
element:
    pseudo-operation
    instruction
  • A PROC directive begins a procedure. The three arguments are:
    • The size of the procedure's local variable space in bytes; this should be a multiple of 4.
    • The maximum number of values pushed on the evaluation stack during the procedure, counting most types as one value, but long integers and doubles as two values. This argument is not currently used by implementations of the Keiko machine, and can be repaced by zero; the only possible disadvatage in future Keiko implementations is that stack overflow not be detected promptly. At present, the stack overflow check leaves a generous margin of space for each procedure to use.
    • A garbage collector map for the stack frame. If the Kieko machine is built without the optional garbage collector, or if the stack frame of the procedure contains no pointers into the heap, then this argument can be zero. If garbage collection is enabled, then every procedure that stores pointers in its frame must have a garbage collector map, which will be either a bitmap expressed as a hexdecimal constant, or the address of a program written in a special mini-language that describes the layout of the frame. This mini-language is described elsewhere.

Pseudo-operations

These pseudo-operations should appear inside a Keiko procedure; most behave like intructions but also contribute additional information to the current procedure.

pseudo-operation:
    LABEL ident
    CONST constant
    GLOBAL ident
    FCONST float
    DCONST float
    QCONST constant
    STKMAP constant
    LINE integer