Bytecode instructions for OBC

Copyright © 2024 J. M. Spivey
Jump to navigation Jump to search

For release 2.9

This list contains the instructions that are generated by the compiler: first, a basic set that allows all necessary operations to be expressed, and contains more-or-less the instructions that are used by the compiler's code generator. Then we can add the packed instructions that are used by the compiler's peephole optimizer; each packed instruction corresponds to a short sequence of basic instructions that frequently occur together.

Basic instructions

Addressing, load and Store

  • PUSH n
Push the small constant n onto the evaluation stack. Like all constants that appear as operands of instructions, n must fit into 16 bits.
  • CONST n
The constant n can be a decimal or hexadecimal constant or an assembler symbol; its value is pushed on the stack. The assembler implements this instruction by placing n in the constant pool for the current procedure and generating an LDKW instruction to fetch it at runtime.
  • FCONST x, DCONST x, QCONST n
Push a floating-point, double-precision or long-integer constant.
  • LOCAL n
Push the address of a local variable that is at offset n from the frame head.
  • LOADC, LOADS, LOADW, LOADD
Pop an address and push the contents of the (8, 16, 32, 64)-bit location it names.
  • STOREC, STORES, STOREW, STORED
Pop an address and a one- or two-word value and store the value into the (8, 16, 32, 64)-bit location named by the address.
  • PLUSA
Pop an integer offset and an address and push their sum. This instruction behaves like integer addition, but is kept separate to help the JIT translator.
  • DUP n
Duplicate the one-word item that is n words from the top of the stack – so DUP 0 duplicates the top element on the stack.
  • SWAP
Swap two one-word items at the top of the stack.
  • POP n
Pop n one-word items from the stack.

Integer arithmetic

All arithmetic instructions expect their operands on the evaluation stack and leave their result on the stack.

  • PLUS, MINUS, TIMES, DIV, MOD
Binary arithmetic operations on integers. (DIV and MOD are guaranteed to give mathematically correct results).
  • UMINUS
Unary minus on integers
  • EQ, LT, GT, LEQ, GEQ, NEQ
Integer comparisons with a Boolean result
  • AND, OR, NOT
Boolean operations. As in C, these treat any non-zero operand as true, and produce a result that is either 1 or 0.
  • BITAND, BITOR, BITXOR, BITNOT
Bitwise boolean operations.
  • BIT
Given a number between 0 and 31, this generates a word with the only corresponding bit set. The LSB is numbered 0 and the MSB is numbered 31.
  • LSL, LSR, ASR
Shift instructions.
  • QPLUS, QMINUS, QTIMES, QUMINUS, QDIV, QMOD
Long integer (64 bit) instructions.
  • QEQ, QLT, QGT, QLEQ, QGEQ, QNEQ
Long integer (64 bit) comparisons.

Floating point

  • FPLUS, FMINUS, FTIMES, FDIV, FUMINUS
Floating point arithmetic.
  • FEQ, FLT, FGT, FLEQ, FGEQ, FNEQ
Floating-point comparisons with a Boolean result.
  • DPLUS, DMINUS, DTIMES, DDIV, DUMINUS
Double-precision arithmetic.
  • DEQ, DLT, DGT, DLEQ, DGEQ, DNEQ
Double-precision comparisons with a Boolean result.

Conversions

  • CONVNF
Convert integer to floating point.
  • CONVND
Convert integer to double precision.
  • CONVFD
Convert floating point to double precision.
  • CONVDF
Convert double precision to floating point.
  • CONVNC
Convert integer to unsigned character, masking off unwanted bits.
  • CONVNS
Convert integer to signed short, sign extending the result from 16 to 32 bits.
  • CONVNQ
Convert integer to long integer
  • CONVQN
Convert long integer to integer
  • CONVQD
Convert long integer to double precision

Jumps

Labels in the assembler are written as decimal integers.

  • JUMP lab
Jump to label lab.
  • JUMPF lab, JUMPT lab
Pop a Boolean value and jump to label lab if the Boolean is false or if it is true, respectively.
  • LABEL lab

An assembler directive: attach the label lab to the next instruction.

Runtime checks

  • BOUND n
Pop an array bound B and check the index i now at the top of the stack. If the test 0 ≤ i < B fails, report and array bound error on line n.
  • NCHECK n
Check a pointer p on top of the stack. If it is null, report a null pointer error on line n.
  • GCHECK n
Check a static link p on top of the stack. If it is not null, report the error that a local procedure is being assigned to a procedure variable.
  • ZCHECK n, FZCHECK n, DZCHECK n, QZCHECK n
Check a divisor d on top of the stack. If it is zero, report the error of division by zero.
  • ERROR e n
Report runtime error e on line n. The code e should be one of those listed in function message in the file xmain.c.

Language-specific operations

These instructions perform tasks that correspond to specific constructs in Oberon. They may or may not help with the implementation of other languages.

  • TYPETEST n
At the top of the stack are the address of a record descriptor, and below it the address of a record. Pop these, test whether the decriptor is the ancestor of the record at level n, and push a Boolean result.
  • JCASE n, CASEL lab
The instruction JCASE n must be followed by n of the CASEL instructions. It pops an integer i from the stack. If 0 <= i < n, the program continues at the corresponding label; otherwise, it continues with the instruction following the last following CASEL instruction.
  • JRANGE lab
The instruction expects three integers x, lo and hi on the stack. If lo <= x <= hi, the program contiunes at label lab, otherwise, it continues with the next instruction as usual.
  • FIXCOPY
This instruction expects two address a and b and an integer n on the stack. It copies a block of n bytes from address b to address a.
  • FLEXCOPY
This instruction expects to find on the stack the address of an open array parameter and a size in bytes. It allocates space in the stack frame to copy the parameter, then overwrites the parameter with its new address.

Assembler directives

  • PROC name nargs fsize gcmap, END
  • @PRIMDEF name prim nargs fsize gcmap
  • DEFINE name
  • WORD n, FLOAT x, DOUBLE x, LONG n
  • STRING hex
  • GLOBAL name size
  • LINE n
  • MODULE m, IMPORT m, PRIM p, ENDHDR

Packed instructions

For compactness, brevity and speed, the compiler has a peephole optimizer that replaces common sequences of instructions by single instructions that have the same effect. Some of these instructions are realised by the bytecode assembler and interpreter in a form that is more compact and faster than the original sequence.

  • LDLC n, LDLS n, LDLW n, LDLD n
Load from local variable; LDLs n is equivalent to LOCAL n / LOADsW.
  • STLC n, STLS n, STLW n, STLD n
Store to local variable; STLs n is equivalent to LOCAL n / STOREs.
  • LDGC x, LDGS x, LDGW x, LDGD x
Load from global variable; LDGs x is equivalent to CONST x / LOADs. Here x is a symbol that refers to an address in global storage, and the instruction loads from that address.
  • STGC x, STGS x, STGW x, STGD x
Store to global variable; STGs x is equivalent to CONST x / STOREs.
  • JEQ lab, JLT lab, JGT lab, JLEQ lab, JGEQ lab, JNEQ lab
Integer compare and branch; JEQ lab is equivalent to EQ / JTRUE lab, etc.
  • JEQZ lab, JNEQZ lab, JLTZ lab, JGTZ lab, JLEQZ lab, JGEQZ lab
Integer compare with zero and branch; JEQZ lab is equivalent to PUSH 0 / EQ / JTRUE lab, etc.
  • FJEQ lab, FJLT lab, FJGT lab, FJLEQ lab, FJGEQ lab, FJNEQ lab
Floating point compare and branch; FJEQ lab is equivalent to FEQ / JTRUE lab, etc.
  • DJEQ lab, DJLT lab, DJGT lab, DJLEQ lab, DJGEQ lab, DJNEQ lab
Double-precision compare and branch; DJEQ lab is equivalent to DEQ / JTRUE lab, etc.
  • QJEQ lab, QJLT lab, QJGT lab, QJLEQ lab, QJGEQ lab, QJNEQ lab
Long integer compare and branch; QJEQ lab is equivalent to QEQ / JTRUE lab, etc.
  • INC, DEC
Increment and decrement; INC is equivalent to PUSH 1 / PLUS, etc.
  • INCL n, DECL n
Increment or decrement local variable; INCL n is equivalent to LDLW n / INC / STLW n, etc.
  • QINC, QDEC
Long integer increment and decrement; QINC is equivalent to QCONST 1 / QPLUS, etc.
  • BITSUB
Bitwise difference, equivalent to BITNOT / BITAND
  • INDEXC, INDEXS, INDEXW, INDEXD
Scaled addition; INDEXs is equivalent to PUSH n / TIMES / PLUSA where n is 1, 2, 4 or 8.
  • LDIC, LDIS, LDIW, LDID
Indexed load; LDIs is equivalent to INDEXs / LOADs.
  • LDNW n, STNW n
Constant indexed load and store; LDNW n is equivalent to PUSH n / LDIW, etc.
  • LDEW n, STEW n
Load and store in enclosing frame; LDEW n is equivalent to LDLW -4 / LDNW n, and STEW n to LDLW -4 / STNW n. These instructions perform the operations needed to load and store one-word quantities in the stack frame of the statically enclosing procedure by following one link in the static chain.
  • TESTGEQ lab
The sequence DUP 0 / CONST n / JGEQ lab can be abbreviated as CONST n / TESTGEQ lab. This is used in translating CASE statements.

==Procedure call and return

  • CALL n, CALLW n, CALLD n
  • STKMAP n
  • LINK, SAVELINK
  • RETURN, RETURNW, RETURND@
  • ALIGNC, ALIGNS

Support

These instructions are not used directly by the compiler, but are introduced by the assembler in the code it produces for other instructions.

  • FCMP, DCMP
  • LDKW n, LDKD n
  • JPROC n
  • RESULTW, RESULTD@
  • SLIDEW, SLIDED


Older text

Various parts of the system deal with bytecode instructions:

  • The back end of the compiler.
  • The assembler/linker.
  • The bytecode interpreter.
  • JIT translators.

However, the sets of instructions that can appear at various stages vary a bit. The compiler generates code using instructions from a basic set, but there's a peephole optimiser that packs together common sequences into single instructions. For example, the code generator might translate the expression x+1 into (its internal representation of) LOCAL 12 / LOADW / CONST 1 / PLUS, and the peephole optimiser will compress this into LDLW 12 / INC. The peephole optimiser also introduces out-of-line constants for values that will not fit in the 16 bits that is the largest operand supported by the bytecode.

Next in the tool-chain comes the assembler/linker, which makes two significant changes: first, it may have multiple encodings for each instruction as opcodes; and second, some less common instructions generated by the compiler have no encoding, but are expanded by the assembler into equivalent short sequences of instructions. This releases more of the 256 opcodes for encoding more common instructions. For an example of the first kind of change, there are single-byte instructions that push small constants (perhaps 0 to 6) onto the stack, then there is a two-byte sequence that can push any value between -128 and 127, and there is a three-byte sequence that can push values between -32768 and 32767. Any larger values will have been replaced by out-of-line constants by the compiler.

An example of the second kind of change is provided by floating point branches. In place of (for example) the instruction FJLT lab (which pops two floats x and y and branches to lab if x<y), the assembler generates the sequence FCMP / JLTZ lab, first comparing the two floats and pushing a comparison indicator -1, 0, or +1, then popping this and branching to lab if it is negative.

A sensible approach in a JIT translator is first to decode the opcodes into instructions, then to reverse the packing process that was done by the peephole optimiser, getting code that is expressed in terms of a small number of basic operations. These can then be used as the basis of instruction selection in a way that is more general than can be acheived by fixed translation of each instruction that is present in the raw bytecode.

Core instructions

These are the instructions as generated by the compiler before the peephole optimiser does its packing. They are also a reasonable set to implement directly in a JIT translator, after expanding packed instructions into their constituent parts.

CONST n -- Push constant
LDKW n -- Push word from constant pool
LOCAL n -- Push address of local
PLUSA -- Add address and offset
LOADs -- Pop address and push contents (s = W, S, C, D)
STOREs -- Pop address then value, and store
DUP n
SWAP
POP n
PLUS, MINUS, TIMES, DIV, MOD -- Integer arithmetic
UMINUS -- Integer unary minus
AND, OR, NOT -- Boolean operators
BITAND, BITOR, BITXOR, BITSUB, BITNOT -- Bitwise operators
BIT -- Pop n, push (1 << n).  [Note that BIT(32) = 0]
LSL, LSR -- Logical shifts
ASR -- Arithmetic shift right
EQ, LT, GT, LEQ, GEQ, NEQ -- Integer comparisons
JUMPF, JUMPT -- Pop boolean and jump if true or false
JUMP -- Unconditional jump
JCASE, JRANGE -- Special jumps for CASE statements
FPLUS, FMINUS, FTIMES, FDIV -- Floating-point arithmetic
FUMINUS -- Floating point unary minus
FEQ, FLT, FGT, FLEQ, FGEQ, FNEQ -- Floating-point comparisons
DPLUS, DMINUS, DTIMES, DDIV -- Double-precision arithmetic
DUMINUS -- Double-precision unary minus
DEQ, DLT, DGT, DLEQ, DGEQ, DNEQ -- Double-precision comparisons
CONVNF, CONVFN -- Convert between integer and floating point
CONVND, CONVDN -- Convert between integer and double precision
CONVFD, CONVDF -- Convert between float and double
CONVNC -- Convert integer to character
CONVNS -- Convert integer to short
BOUND -- Array bound check
NCHECK -- Null pointer check
GCHECK -- Check for global procedure
ZCHECK -- Zero divisor check
FZCHECK -- Floating-point zero divisor check
DZCHECK -- Double-precision zero divisor check
ERROR -- Runtime error
ALIGNC, ALIGNS -- Align argument in 32-bit word
FIXCOPY -- Copy fixed-size aggregate
FLEXCOPY -- Allocate and copy open array parameter
TYPETEST -- Class membership test
CALL, CALLW, CALLD -- Procedure call
TCALL, TCALLW, TCALLD -- Tail call
LINK -- Pass static link
SAVELINK -- Save static link in frame
RETURN, RETURNW, RETURND -- Return from procedure
LNUM -- Line number for debugging and profiling

Packed instructions

These instructions are introduced by the peephole optimiser as compact equivalents for sequences of common instructions.

 INDEXs  =  PUSH s / BINOP Times / BINOP PlusA where s can be W=4, S=2, C=1, D=8
 LDLs n  =  LOCAL n / LOADs
 STLs n  =  LOCAL n / STOREs
 LDGs n  =  LDKW n / LOADs
 STGs n  =  LDKW n / STOREs
 LDIs    =  INDEXs / LOADs
 STIs    =  INDEXs / STOREs
 LDNW n  =  CONST n / LDIW
 STNW n  =  CONST n / STIW
 LDEW n  =  LDLW -4 / LDNW n
 STEW n  =  LDLW -4 / STNW n
 INCL n  =  LDLW n / INC / STLW n
 DECL n  =  LDLW n / DEC / STLW n
 JEQZ lab, etc.  =  CONST 0 / JEQ lab, etc.
 TESTGEQ lab  =  DUP 0 / JLT lab
 JTYPET m lab =  TYPETEST m / JUMPT lab
 JTYPEF m lab =  TYPETEST m / JUMPF lab

Support instructions

The assembler and bytecode interpreter spend lots of opcodes on common instructions like "load local", so that a fair number of locals can be accessed with a fast, single-byte instruction. To free up opcodes for this, they implement other, less common instructions with an equivalent short sequence of others. These extra instructions are there to help with this process.

FCMP -- Compare floating point values and push comparison indicator
DCMP -- Ditto for double-precision values

Because these instructions are never generated by the compiler, but only be the assembler as part of a canned sequence, it's OK for a JIT translator to handle them only in restricted circumstances; for example, FCMP is always followed by an integer comparison with zero or a conditional jump.