Specification of Keiko (Compilers)

From Spivey's Corner
Jump to: navigation, search

This page describes the assembly language of the Keiko machine that is used as a target for the compilers in the course. Some features are omitted if they are not used in the course.

Program structure

A Keiko program begins with a header, which for our purposes always consists of the text,

MODULE Main 0 0
IMPORT Lib 0
ENDHDR

The header is followed by a sequence of definitions; each may be a procedure, a global variable, or a string constant.

Blank lines are ignored, as are lines that begin with the comment character !.

Procedures

A procedure begins with a PROC directive and ends with an END directive; between them appears the bytecode of the procedure.

  • PROC p n 0 0
Begin a procedure named p that uses n bytes of space for local variables. (The two zeroes replace information that is used when the runtime system contains a garbage collector.)
  • END
End a procedure.

Global variables and string constants

  • GLOVAR x n
Define x as a global variable occupying n bytes of space. All global variables are allocated on a 4-byte boundary.
  • DEFINE x
Define x as a label for the string constant that follows.
  • STRING hex
Treat hex as a sequence of pairs of hexadecimal digits, each pair corresponding to one byte. Store the sequence of bytes as a string constant. Each STRING directive generates exactly the number of bytes specified, so several such directives can be used next to each other to build up a long string. If the string is to be zero-terminated, then the last such directive should end with a zero byte, denoted by 00.

Apart from the directives listed above, all other elements in a Keiko program are bytecode instructions that may appear only between a PROC directive and its matching END. The different instructions are listed below in groups according to their function; in each group, a basic set of instructions is listed separately from a set of compound instructions that can each be defined as equivalent to a specific sequence of basic instructions.

Note that there's no guarantee that an implementation of Keiko will actually provide all the instructions listed here. Some instructions may be implemented by a sequence of others: specifically, some of the less common instructions that are described as being compound may be expanded by the assembler into the equivalent sequence of basic instructions. Also, some other operations may be implemented by other 'secret' instructions; for example, floating point comparisons are implemented by a special FCMP instruction followed by an integer comparison.

Addressing, load and store

Basic instructions:

  • CONST n
The constant n can be a decimal or hexadecimal constant; its value is pushed on the stack.
  • FCONST x, DCONST x, QCONST n
Push a floating-point, double-precision or long-integer constant.
  • GLOBAL x
Push the address of a global variable x (declared with a GLOVAR directive).
  • LOCAL n
Push the address of a local variable that is at offset n from the base pointer.
  • LOADC, LOADS, LOADW, LOADD
Pop an address and push the contents of the (8, 16, 32, 64)-bit location it names.
  • STOREC, STORES, STOREW, STORED
Pop an address and a one- or two-word value and store the value into the (8, 16, 32, 64)-bit location named by the address.
  • PLUSA
Pop an address and an integer offset push their sum. This instruction behaves like integer addition, but is kept separate to help the JIT translator.
  • DUP n
Duplicate the one-word item that is n words from the top of the stack – so DUP 0 duplicates the top element on the stack.
  • SWAP
Swap two one-word items at the top of the stack.
  • POP n
Pop n one-word items from the stack.

Compound instructions:

  • LDLC n, LDLS n, LDLW n, LDLD n
Load from local variable; LDLs n is equivalent to LOCAL n / LOADs.
  • STLC n, STLS n, STLW n, STLD n
Store to local variable; STLs n is equivalent to LOCAL n / STOREs.
  • LDGC x, LDGS x, LDGW x, LDGD x
Load from global variable; LDGs x is equivalent to GLOBAL x / LOADs. Here x is a symbol that refers to an address in global storage, and the instruction loads from that address.
  • STGC x, STGS x, STGW x, STGD x
Store to global variable; STGs x is equivalent to GLOBAL x / STOREs.
  • INDEXC, INDEXS, INDEXW, INDEXD
Scaled addition; INDEXs is equivalent to PUSH n / TIMES / PLUSA where n is 1, 2, 4 or 8.
  • LDIC, LDIS, LDIW, LDID
Indexed load; LDIs is equivalent to INDEXs / LOADs.
  • LDNW n, STNW n
Constant indexed load and store; LDNW n is equivalent to PUSH n / PLUSA / LOADW, etc.

Integer arithmetic

All arithmetic instructions expect their operands on the evaluation stack and leave their result on the stack. The binary operations expect to find their right-hand operand on top of the stack and their left-hand operand just beneath it.

  • PLUS, MINUS, TIMES, DIV, MOD
Binary arithmetic operations on integers. (DIV and MOD are guaranteed to give mathematically correct results).
  • UMINUS
Unary minus on integers
  • EQ, LT, GT, LEQ, GEQ, NEQ
Integer comparisons with a Boolean result
  • AND, OR, NOT
Boolean operations. As in C, these treat any non-zero operand as true, and produce a result that is either 1 or 0.
  • BITAND, BITOR, BITXOR, BITNOT
Bitwise boolean operations.
  • BIT
Given a number between 0 and 31, this unary operation generates a word with the only corresponding bit set. The LSB is numbered 0 and the MSB is numbered 31.
  • LSL, LSR, ASR
Shift instructions.
  • QPLUS, QMINUS, QTIMES, QUMINUS, QDIV, QMOD
Long integer (64 bit) arithmetic operations.
  • QEQ, QLT, QGT, QLEQ, QGEQ, QNEQ
Long integer (64 bit) comparisons.

Compound operations:

  • INC, DEC
Increment and decrement; INC is equivalent to CONST 1 / PLUS, etc.
  • INCL n, DECL n
Increment or decrement local variable; INCL n is equivalent to LDLW n / INC / STLW n, etc.
  • QINC, QDEC
Long integer increment and decrement; QINC is equivalent to QCONST 1 / QPLUS, etc.
  • BITSUB
Bitwise difference, equivalent to BITNOT / BITAND

Floating point

  • FPLUS, FMINUS, FTIMES, FDIV, FUMINUS
Floating point arithmetic.
  • FEQ, FLT, FGT, FLEQ, FGEQ, FNEQ
Floating-point comparisons with a Boolean result.
  • DPLUS, DMINUS, DTIMES, DDIV, DUMINUS
Double-precision arithmetic.
  • DEQ, DLT, DGT, DLEQ, DGEQ, DNEQ
Double-precision comparisons with a Boolean result.

Conversions

  • CONVNF
Convert integer to floating point.
  • CONVND
Convert integer to double precision.
  • CONVFD
Convert floating point to double precision.
  • CONVDF
Convert double precision to floating point.
  • CONVNC
Convert integer to unsigned character, masking off unwanted bits.
  • CONVNS
Convert integer to signed short, sign extending the result from 16 to 32 bits.
  • CONVNQ
Convert integer to long integer
  • CONVQN
Convert long integer to integer
  • CONVQD
Convert long integer to double precision

Jumps

Labels in the assembler are written as decimal integers.

  • JUMP lab
Jump to label lab.
  • JUMPF lab, JUMPT lab
Pop a Boolean value and jump to label lab if the Boolean is false or if it is true, respectively.
  • LABEL lab
An assembler directive: attach the label lab to the next instruction.

Compound operations, each equivalent to a comparison followed by a conditional jump:

  • JEQ lab, JLT lab, JGT lab, JLEQ lab, JGEQ lab, JNEQ lab
Integer compare and branch; JEQ lab is equivalent to EQ / JTRUE lab, etc.
  • JEQZ lab, JNEQZ lab, JLTZ lab, JGTZ lab, JLEQZ lab, JGEQZ lab
Integer compare with zero and branch; JEQZ lab is equivalent to CONST 0 / EQ / JTRUE lab, etc.
  • FJEQ lab, FJLT lab, FJGT lab, FJLEQ lab, FJGEQ lab, FJNEQ lab
Floating point compare and branch; FJEQ lab is equivalent to FEQ / JTRUE lab, etc.
  • DJEQ lab, DJLT lab, DJGT lab, DJLEQ lab, DJGEQ lab, DJNEQ lab
Double-precision compare and branch; DJEQ lab is equivalent to DEQ / JTRUE lab, etc.
  • QJEQ lab, QJLT lab, QJGT lab, QJLEQ lab, QJGEQ lab, QJNEQ lab
Long integer compare and branch; QJEQ lab is equivalent to QEQ / JTRUE lab, etc.

Procedure call

  • PCALL n
Call a procedure with n arguments and no result.
  • PCALLW n
Call a procedure with n arguments and a one-word result.

These instructions expect to find the n argument words on the evaluation stack, with the first argument nearest the top, and the static link and address of the procedure on top of them. First, the machine pushes the current program counter value as the return address, then pushes the current value of the base pointer register bp as the dynamic link and resets bp to point to the dynamic link. Finally, the program counter is set to the beginning of the procedure.

  • RETURN
Return from a procedure with no result.
  • RETURNW
Return from a procedure with a one-word result.

These instructions reverse the frame creation steps described above, resetting the stack pointer, the base pointer, and the program counter to the values they have before the procedure was called. In addition, the n words of arguments that were pushed before the CALL instruction are popped from the stack.

Runtime checks

  • BOUND n
Pop an array bound B and check the index i now at the top of the stack. If the test 0 i < B fails, report an array bound error on line n.
  • NCHECK n
Check a pointer p on top of the stack. If it is null, report a null pointer error on line n.
  • GCHECK n
Check a static link p on top of the stack. If it is not null, report the error that a local procedure is being assigned to a procedure variable.
  • ZCHECK n, FZCHECK n, DZCHECK n, QZCHECK n
Check a divisor d on top of the stack. If it is zero, report the error of division by zero.
  • ERROR e n
Report runtime error e on line n. The code e should be one of those listed in function message in the file xmain.c.

Language-specific operations

These instructions perform tasks that correspond to specific constructs in Oberon. They may or may not help with the implementation of other languages.

  • TYPETEST n
At the top of the stack are the address of a record descriptor, and below it the address of a record. Pop these, test whether the decriptor is the ancestor of the record at level n, and push a Boolean result.
  • JCASE n, CASEL lab
The instruction JCASE n must be followed by n of the CASEL instructions. It pops an integer i from the stack. If 0 <= i < n, the program continues at the corresponding label; otherwise, it continues with the instruction following the last following CASEL instruction.
  • JRANGE lab
The instruction expects three integers x, lo and hi on the stack. If lo x hi, the program contiunes at label lab, otherwise, it continues with the next instruction as usual.
  • FIXCOPY
This instruction expects two address a and b and an integer n on the stack. It copies a block of n bytes from address b to address a.
  • FLEXCOPY
This instruction expects to find on the stack the address of an open array parameter and a size in bytes. It allocates space in the stack frame to copy the parameter, then overwrites the parameter with its new address.
  • LINK, SAVELINK
Nested Oberon procedures expect their static link in a special register SL. The LINK instruction pops a word from the stack and puts it into the SL register; the SAVELINK instruction saves the contents of the SL register at offset -4 from the base pointer.
  • LDEW n, STEW n
Load and store in enclosing frame; LDEW n is equivalent to LDLW -4 / LDNW n, and STEW n to LDLW -4 / STNW n. These instructions perform the operations needed to load and store one-word quantities in the stack frame of the statically enclosing procedure by following one link in the static chain.
  • TESTGEQ lab
The sequence DUP 0 / CONST n / JGEQ lab can be abbreviated as CONST n / TESTGEQ lab. This is used in translating CASE statements.

Special operations for the compilers course

To support some of the practicals, I've added a couple of special instructions. Details are given in the lab manual, but should be copied here.

Secret implementation-level instructions

Some of the advertised assembly-level instructions don't exist in the current implementation of Keiko, and are replaced by the assembler with combinations of other instructions, including some secret ones that exist in the implementation only for this purpose. These instructions should not be used in compiler output.

  • PUSH n
This is a more compact version of the CONST instruction, optimised for operands that are small integers.
  • FCMP, DCMP
These instructions pop two floating-point or double-precision values, compare them, and push an integer, -1, 0 or +1, that indicates the ordering relationship between the operands.
  • JPROC
  • SLIDE n
  • SLIDEW n
Each PCALL n or PCALLW n instruction is replaced by a pair JPROC / SLIDE n+1 or JPROC / SLIDEW n+1. The JPROC instruction jumps to the procedure, which returns to the SLIDE instruction. The SLIDE instruction removes the procedure arguments from the stack; in the case of SLIDEW, it preserves the result of the procedure as it does so.
  • RESULTW
The RETURNW instruction is actually equivalent to the pair RESULTW / RETURN. The RESULTW preserves the procedure result in a special register, where it will be found be a subsequent SLIDEW instruction, and the RETURN instruction does the rest of the job of returning from the procedure.
Personal tools

Variants
Actions
Navigation
Tools