[Template fetch failed for http://spivey.oriel.ox.ac.uk/corner/Template:Sitenotice?action=render: HTTP 404]
Keiko assembly language
Syntax
This section gives the syntax of Keiko assembly language programs in the form that is accepted by the bytecode assembler/linker oblink
. The style of syntax description is similar to that used in the Kernighan & Ritchie book on C: a syntactic category is followed by a sequence of alternatives, each on a separate line. A subscript opt indicates that a construct is optional.
Lexical conventions
- Each element on its own line (nl used below to denote a line boundary)
- Blank lines and lines beginning
#
are ignored - Identifiers can be any sequence of non-blank characters, including e.g.
Files.Read
. It's wise to avoid indentifoers that begin with a digit or minus sign, as in some contexts these may be interpreted as numeric constants.
Files
A Keiko file contains a heading that gives the name of the module and lists (in IMPORT
directives) other modules that it depends upon. A compiler that outputs Keiko code can generate a checksum for the public interface of a module and embed this checksum in each other module that uses it, and the assembler/linker will then check across all modules in a program that the checksums are consistent. Unused checksums can be replaced by 0. The module header also contains a count of source lines in the module that is used to allocate counters for line-count profiling; this too can be replaced by 0 of profiling is not going to be used on the program.
file: heading bodyopt
heading: module-directive importsopt endhdr-directive
module-directive: MODULE ident checksum linecnt nl
imports: import-directive import-directive imports
import-directive: IMPORT ident checksum nl
endhdr-directive: ENDHDR nl
The body of a module constists of multi-line procedures interspersed with other single-line directives that (among other things) allocate global storage.
body: phrase phrase body
phrase: directive procedure
Directives
Directives appear between the procedures of a program.
directive: DEFINE ident nl WORD constant nl LONG constant nl FLOAT float nl DOUBLE float nl STRING hex-string nl GLOVAR ident integer nl PRIMDEF ident ident type-string nl
- a
DEFINE
directive defines a symbol at the current location in the data segment. That location is the address of any following data item created with another directive such asWORD
,FLOAT
orSTRING
. - the
WORD
,LONG
,FLOAT
andDOUBLE
directives each contribute a numeric constant to the data segment, allowing global data tables to be initialised; the table can be accessed through a label defined by a precedingDEFINE
directive. - a
STRING
directive contributes a sequence of characters, specified by a hexadecimal string, to the data segment. If a terminating null character is needed, then this should be included in the hex string. The length of the string is padded to a multiple of 4 bytes. When convenient, it is possible to build up a string in several parts by giving multiple successiveSTRING
directives, provided the length of all but the last directive is a multiple of 4 to prevent padding. - a
GLOVAR
directive allocates space of a specified size in the bss segment, and defines a symbol with its address. The size is rounded up to a multiple of 4, so that the current location in the bss segment is always aligned. - a
PRIMDEF
directive declares a named primitive whose definition is a C subroutine. A directive such asPRIMDEF Math.sqrt sqrtf FF
declares a primitive that will be namedMath.sqrt
in the Keiko program, and interfaces to the standard C library functionsqrtf
, which the type stringFF
describes as taking a singlefloat
argument and yielding afloat
result. Some implementations of Keiko are able to link dynamically to libaries containing C functions, and others require an interpreter containing the primitives to be compiled specially.
Procedures
Each procedure has a heading that gives its name and some other information. This is followed by a sequence of mingled Keiko machine instructions and pseudo-operations. The pseudo-operations typically assemble into an entry in the procedure's constant pool, together with an instruction that loads the constant onto the stack.
procedure: proc-directive bodyopt end-directive
proc-directive: PROC ident 0 0 0 nl
body: element element body
end-directive: END nl
element: pseudo-operation instruction
- A
PROC
directive begins a procedure. The three arguments are:- The size of the procedure's local variable space in bytes; this should be a multiple of 4.
- The maximum number of values pushed on the evaluation stack during the procedure, counting most types as one value, but long integers and doubles as two values. This argument is not currently used by implementations of the Keiko machine, and can be repaced by zero; the only possible disadvatage in future Keiko implementations is that stack overflow not be detected promptly. At present, the stack overflow check leaves a generous margin of space for each procedure to use.
- A garbage collector map for the stack frame. If the Kieko machine is built without the optional garbage collector, or if the stack frame of the procedure contains no pointers into the heap, then this argument can be zero. If garbage collection is enabled, then every procedure that stores pointers in its frame must have a garbage collector map, which will be either a bitmap expressed as a hexdecimal constant, or the address of a program written in a special mini-language that describes the layout of the frame. This mini-language is described elsewhere.
Instructions
instruction: opcode operandsopt nl
operands: constant constant operands
Each instruction has an optional list of operands, which (depending on the instruction) can be integer constants, assembler symbols, and labels.
Pseudo-operations
These pseudo-operations should appear inside a Keiko procedure; most behave like intructions but also contribute additional information to the current procedure.
pseudo-operation: LABEL ident nl CONST constant nl GLOBAL ident nl FCONST float nl DCONST float nl QCONST constant nl STKMAP constant nl LINE integer nl
- The
LABEL
pseudo-op defines its argument to as a label for the next instruction in the procedure. Labels can be arbitrary identifiers and have a scope that is the whole of the current procedure. They are used only in branch instructions, and do not have a value that can be stored in a variables. - The next few pseudo-ops act as instructions that push a value on the stack, but are capable of handling 32-bit or 64-bit values that are stored out of line in the constant pool for the procedure.
CONST
pushes an arbitrary integer or address;GLOBAL
is similar, but retricted to the addresses of globals;FCONST
,DCONST
andQCONST
push float, double and long integer constants respectively, with the double and long integer constants taking up two stack slots. These pseudo-ops are typically translated by the Keiko assembler intoLDKW
orLDKD
instructions that reference a slot it has allocated in the constant pool. As a special caseCONST
pseudo-ops that contain a small constant are translated intoPUSH
instructions that use an inline constant, either encoded directly in the opcode byte, or following it as the next one or two bytes of the instruction stream. All this is hidden from programmers and compilers by the Keiko assembler. - The
STKMAP
pseudo-op specifes a pointer map for the evaluation stack that holds at an immediately followingCALL
instruction. Any pointer values on the evaluation stack that are used as arguments to the procedure call will be covered by the procedure's own stack map, so this pseudo-op is needed only in the rare case where other values near the bottom of the evaluation stack will persist over the call. These stack maps are gathered for the whole procedure and used by the assembler to compile a stack map table that – alongside the code and the constant pool – forms part of the runtime representation of the procedure. If the Keiko machine is built without a garbage collector, then naturally enough these stack maps can be omitted. - The
LINE
pseudo-op marks a source line, with an argument that is the line number. It adds the line number to a table that the assembler includes with the object program, and also generates anLNUM
instruction in the code. TheLNUM
instructions are used both by the Keiko profiler, which can count how many times each line is executed, and by debuggers, which can replace them withBREAK
insructions to implement breakpoints.