# Bytecode instructions for OBC

## Contents

# For release 2.9

This list contains the instructions that are generated by the compiler: first, a basic set that allows all necessary operations to be expressed, and contains more-or-less the instructions that are used by the compiler's code generator. Then we can add the packed instructions that are used by the compiler's peephole optimizer; each packed instruction corresponds to a short sequence of basic instructions that frequently occur together.

## Basic instructions

### Addressing, load and Store

`PUSH n`

- Push the small constant
*n*onto the evaluation stack. Like all constants that appear as operands of instructions,*n*must fit into 16 bits.

`CONST n`

- The constant
*n*can be a decimal or hexadecimal constant or an assembler symbol; its value is pushed on the stack. The assembler implements this instruction by placing*n*in the*constant pool*for the current procedure and generating an*LDKW*instruction to fetch it at runtime.

`FCONST x`

,`DCONST x`

,`QCONST n`

- Push a floating-point, double-precision or long-integer constant.

`LOCAL n`

- Push the address of a local variable that is at offset
*n*from the frame head.

`LOADC`

,`LOADS`

,`LOADW`

,`LOADD`

- Pop an address and push the contents of the (8, 16, 32, 64)-bit location it names.

`STOREC`

,`STORES`

,`STOREW`

,`STORED`

- Pop an address and a one- or two-word value and store the value into the (8, 16, 32, 64)-bit location named by the address.

`PLUSA`

- Pop an integer offset and an address and push their sum. This instruction behaves like integer addition, but is kept separate to help the JIT translator.

`DUP n`

- Duplicate the one-word item that is
`n`

words from the top of the stack – so`DUP 0`

duplicates the top element on the stack.

`SWAP`

- Swap two one-word items at the top of the stack.

`POP n`

- Pop
`n`

one-word items from the stack.

### Integer arithmetic

All arithmetic instructions expect their operands on the evaluation stack and leave their result on the stack.

`PLUS`

,`MINUS`

,`TIMES`

,`DIV`

,`MOD`

- Binary arithmetic operations on integers. (
`DIV`

and`MOD`

are guaranteed to give mathematically correct results).

`UMINUS`

- Unary minus on integers

`EQ`

,`LT`

,`GT`

,`LEQ`

,`GEQ`

,`NEQ`

- Integer comparisons with a Boolean result

`AND`

,`OR`

,`NOT`

- Boolean operations. As in C, these treat any non-zero operand as true, and produce a result that is either 1 or 0.

`BITAND`

,`BITOR`

,`BITXOR`

,`BITNOT`

- Bitwise boolean operations.

`BIT`

- Given a number between 0 and 31, this generates a word with the only corresponding bit set. The LSB is numbered 0 and the MSB is numbered 31.

`LSL`

,`LSR`

,`ASR`

- Shift instructions.

`QPLUS`

,`QMINUS`

,`QTIMES`

,`QUMINUS`

,`QDIV`

,`QMOD`

- Long integer (64 bit) instructions.

`QEQ`

,`QLT`

,`QGT`

,`QLEQ`

,`QGEQ`

,`QNEQ`

- Long integer (64 bit) comparisons.

### Floating point

`FPLUS`

,`FMINUS`

,`FTIMES`

,`FDIV`

,`FUMINUS`

- Floating point arithmetic.

`FEQ`

,`FLT`

,`FGT`

,`FLEQ`

,`FGEQ`

,`FNEQ`

- Floating-point comparisons with a Boolean result.

`DPLUS`

,`DMINUS`

,`DTIMES`

,`DDIV`

,`DUMINUS`

- Double-precision arithmetic.

`DEQ`

,`DLT`

,`DGT`

,`DLEQ`

,`DGEQ`

,`DNEQ`

- Double-precision comparisons with a Boolean result.

### Conversions

`CONVNF`

- Convert integer to floating point.

`CONVND`

- Convert integer to double precision.

`CONVFD`

- Convert floating point to double precision.

`CONVDF`

- Convert double precision to floating point.

`CONVNC`

- Convert integer to unsigned character, masking off unwanted bits.

`CONVNS`

- Convert integer to signed short, sign extending the result from 16 to 32 bits.

`CONVNQ`

- Convert integer to long integer

`CONVQN`

- Convert long integer to integer

`CONVQD`

- Convert long integer to double precision

### Jumps

Labels in the assembler are written as decimal integers.

`JUMP lab`

- Jump to label
`lab`

.

`JUMPF lab`

,`JUMPT lab`

- Pop a Boolean value and jump to label
`lab`

if the Boolean is false or if it is true, respectively.

`LABEL lab`

An assembler directive: attach the label `lab`

to the next instruction.

### Runtime checks

`BOUND n`

- Pop an array bound
*B*and check the index*i*now at the top of the stack. If the test 0 ≤*i*<*B*fails, report and array bound error on line*n*.

`NCHECK n`

- Check a pointer
*p*on top of the stack. If it is null, report a null pointer error on line*n*.

`GCHECK n`

- Check a static link
*p*on top of the stack. If it is not null, report the error that a local procedure is being assigned to a procedure variable.

`ZCHECK n`

,`FZCHECK n`

,`DZCHECK n`

,`QZCHECK n`

- Check a divisor
*d*on top of the stack. If it is zero, report the error of division by zero.

`ERROR e n`

- Report runtime error
*e*on line*n*. The code*e*should be one of those listed in function`message`

in the file`xmain.c`

.

### Language-specific operations

These instructions perform tasks that correspond to specific constructs in Oberon. They may or may not help with the implementation of other languages.

`TYPETEST n`

- At the top of the stack are the address of a record descriptor, and below it the address of a record. Pop these, test whether the decriptor is the ancestor of the record at level
*n*, and push a Boolean result.

`JCASE n`

,`CASEL lab`

- The instruction
`JCASE n`

must be followed by`n`

of the`CASEL`

instructions. It pops an integer`i`

from the stack. If`0 <= i < n`

, the program continues at the corresponding label; otherwise, it continues with the instruction following the last following`CASEL`

instruction.

`JRANGE lab`

- The instruction expects three integers
`x`

,`lo`

and`hi`

on the stack. If`lo <= x <= hi`

, the program contiunes at label`lab`

, otherwise, it continues with the next instruction as usual.

`FIXCOPY`

- This instruction expects two address
`a`

and`b`

and an integer`n`

on the stack. It copies a block of`n`

bytes from address`b`

to address`a`

.

`FLEXCOPY`

- This instruction expects to find on the stack the address of an open array parameter and a size in bytes. It allocates space in the stack frame to copy the parameter, then overwrites the parameter with its new address.

### Assembler directives

`PROC name nargs fsize gcmap`

,`END`

- @PRIMDEF name prim nargs fsize gcmap

`DEFINE name`

`WORD n`

,`FLOAT x`

,`DOUBLE x`

,`LONG n`

`STRING hex`

`GLOBAL name size`

`LINE n`

`MODULE m`

,`IMPORT m`

,`PRIM p`

,`ENDHDR`

## Packed instructions

For compactness, brevity and speed, the compiler has a peephole optimizer that replaces common sequences of instructions by single instructions that have the same effect. Some of these instructions are realised by the bytecode assembler and interpreter in a form that is more compact and faster than the original sequence.

`LDLC n`

,`LDLS n`

,`LDLW n`

,`LDLD n`

- Load from local variable;
`LDLs n`

is equivalent to`LOCAL n / LOADsW`

.

`STLC n`

,`STLS n`

,`STLW n`

,`STLD n`

- Store to local variable;
`STLs n`

is equivalent to`LOCAL n / STOREs`

.

`LDGC x`

,`LDGS x`

,`LDGW x`

,`LDGD x`

- Load from global variable;
`LDGs x`

is equivalent to`CONST x / LOADs`

. Here*x*is a symbol that refers to an address in global storage, and the instruction loads from that address.

`STGC x`

,`STGS x`

,`STGW x`

,`STGD x`

- Store to global variable;
`STGs x`

is equivalent to`CONST x / STOREs`

.

`JEQ lab`

,`JLT lab`

,`JGT lab`

,`JLEQ lab`

,`JGEQ lab`

,`JNEQ lab`

- Integer compare and branch;
`JEQ lab`

is equivalent to`EQ / JTRUE lab`

, etc.

`JEQZ lab`

,`JNEQZ lab`

,`JLTZ lab`

,`JGTZ lab`

,`JLEQZ lab`

,`JGEQZ lab`

- Integer compare with zero and branch;
`JEQZ lab`

is equivalent to`PUSH 0 / EQ / JTRUE lab`

, etc.

`FJEQ lab`

,`FJLT lab`

,`FJGT lab`

,`FJLEQ lab`

,`FJGEQ lab`

,`FJNEQ lab`

- Floating point compare and branch;
`FJEQ lab`

is equivalent to`FEQ / JTRUE lab`

, etc.

`DJEQ lab`

,`DJLT lab`

,`DJGT lab`

,`DJLEQ lab`

,`DJGEQ lab`

,`DJNEQ lab`

- Double-precision compare and branch;
`DJEQ lab`

is equivalent to`DEQ / JTRUE lab`

, etc.

`QJEQ lab`

,`QJLT lab`

,`QJGT lab`

,`QJLEQ lab`

,`QJGEQ lab`

,`QJNEQ lab`

- Long integer compare and branch;
`QJEQ lab`

is equivalent to`QEQ / JTRUE lab`

, etc.

`INC`

,`DEC`

- Increment and decrement;
`INC`

is equivalent to`PUSH 1 / PLUS`

, etc.

`INCL n`

,`DECL n`

- Increment or decrement local variable;
`INCL n`

is equivalent to`LDLW n / INC / STLW n`

, etc.

`QINC`

,`QDEC`

- Long integer increment and decrement;
`QINC`

is equivalent to`QCONST 1 / QPLUS`

, etc.

`BITSUB`

- Bitwise difference, equivalent to
`BITNOT / BITAND`

`INDEXC`

,`INDEXS`

,`INDEXW`

,`INDEXD`

- Scaled addition;
`INDEXs`

is equivalent to`PUSH n / TIMES / PLUSA`

where*n*is 1, 2, 4 or 8.

`LDIC`

,`LDIS`

,`LDIW`

,`LDID`

- Indexed load;
`LDIs`

is equivalent to`INDEXs / LOADs`

.

`LDNW n`

,`STNW n`

- Constant indexed load and store;
`LDNW n`

is equivalent to`PUSH n / LDIW`

, etc.

`LDEW n`

,`STEW n`

- Load and store in enclosing frame;
`LDEW n`

is equivalent to`LDLW -4 / LDNW n`

, and`STEW n`

to`LDLW -4 / STNW n`

. These instructions perform the operations needed to load and store one-word quantities in the stack frame of the statically enclosing procedure by following one link in the static chain.

`TESTGEQ lab`

- The sequence
`DUP 0 / CONST n / JGEQ lab`

can be abbreviated as`CONST n / TESTGEQ lab`

. This is used in translating CASE statements.

==Procedure call and return

`CALL n`

,`CALLW n`

,`CALLD n`

`STKMAP n`

`LINK`

,`SAVELINK`

`RETURN`

,`RETURNW,`

RETURND@

`ALIGNC`

,`ALIGNS`

### Support

These instructions are not used directly by the compiler, but are introduced by the assembler in the code it produces for other instructions.

`FCMP`

,`DCMP`

`LDKW n`

,`LDKD n`

`JPROC n`

`RESULTW,`

RESULTD@

`SLIDEW`

,`SLIDED`

# Older text

Various parts of the system deal with bytecode instructions:

- The back end of the compiler.
- The assembler/linker.
- The bytecode interpreter.
- JIT translators.

However, the sets of instructions that can appear at various stages vary a bit. The compiler generates code using instructions from a basic set, but there's a peephole optimiser that packs together common sequences into single instructions. For example, the code generator might translate the expression *x*+1 into (its internal representation of) `LOCAL 12 / LOADW / CONST 1 / PLUS`

, and the peephole optimiser will compress this into `LDLW 12 / INC`

. The peephole optimiser also introduces out-of-line constants for values that will not fit in the 16 bits that is the largest operand supported by the bytecode.

Next in the tool-chain comes the assembler/linker, which makes two significant changes: first, it may have multiple encodings for each instruction as opcodes; and second, some less common instructions generated by the compiler have no encoding, but are *expanded* by the assembler into equivalent short sequences of instructions. This releases more of the 256 opcodes for encoding more common instructions. For an example of the first kind of change, there are single-byte instructions that push small constants (perhaps 0 to 6) onto the stack, then there is a two-byte sequence that can push any value between -128 and 127, and there is a three-byte sequence that can push values between -32768 and 32767. Any larger values will have been replaced by out-of-line constants by the compiler.

An example of the second kind of change is provided by floating point branches. In place of (for example) the instruction `FJLT lab`

(which pops two floats *x* and *y* and branches to `lab`

if `x<y`

), the assembler generates the sequence `FCMP / JLTZ lab`

, first comparing the two floats and pushing a comparison indicator -1, 0, or +1, then popping this and branching to `lab`

if it is negative.

A sensible approach in a JIT translator is first to decode the opcodes into instructions, then to reverse the packing process that was done by the peephole optimiser, getting code that is expressed in terms of a small number of basic operations. These can then be used as the basis of instruction selection in a way that is more general than can be acheived by fixed translation of each instruction that is present in the raw bytecode.

## Core instructions

These are the instructions as generated by the compiler before the peephole optimiser does its packing. They are also a reasonable set to implement directly in a JIT translator, after expanding packed instructions into their constituent parts.

CONST n -- Push constant LDKW n -- Push word from constant pool LOCAL n -- Push address of local PLUSA -- Add address and offset LOADs -- Pop address and push contents (s = W, S, C, D) STOREs -- Pop address then value, and store DUP n SWAP POP n PLUS, MINUS, TIMES, DIV, MOD -- Integer arithmetic UMINUS -- Integer unary minus AND, OR, NOT -- Boolean operators BITAND, BITOR, BITXOR, BITSUB, BITNOT -- Bitwise operators BIT -- Pop n, push (1 << n). [Note that BIT(32) = 0] LSL, LSR -- Logical shifts ASR -- Arithmetic shift right EQ, LT, GT, LEQ, GEQ, NEQ -- Integer comparisons JUMPF, JUMPT -- Pop boolean and jump if true or false JUMP -- Unconditional jump JCASE, JRANGE -- Special jumps for CASE statements FPLUS, FMINUS, FTIMES, FDIV -- Floating-point arithmetic FUMINUS -- Floating point unary minus FEQ, FLT, FGT, FLEQ, FGEQ, FNEQ -- Floating-point comparisons DPLUS, DMINUS, DTIMES, DDIV -- Double-precision arithmetic DUMINUS -- Double-precision unary minus DEQ, DLT, DGT, DLEQ, DGEQ, DNEQ -- Double-precision comparisons CONVNF, CONVFN -- Convert between integer and floating point CONVND, CONVDN -- Convert between integer and double precision CONVFD, CONVDF -- Convert between float and double CONVNC -- Convert integer to character CONVNS -- Convert integer to short BOUND -- Array bound check NCHECK -- Null pointer check GCHECK -- Check for global procedure ZCHECK -- Zero divisor check FZCHECK -- Floating-point zero divisor check DZCHECK -- Double-precision zero divisor check ERROR -- Runtime error ALIGNC, ALIGNS -- Align argument in 32-bit word FIXCOPY -- Copy fixed-size aggregate FLEXCOPY -- Allocate and copy open array parameter TYPETEST -- Class membership test CALL, CALLW, CALLD -- Procedure call TCALL, TCALLW, TCALLD -- Tail call LINK -- Pass static link SAVELINK -- Save static link in frame RETURN, RETURNW, RETURND -- Return from procedure LNUM -- Line number for debugging and profiling

## Packed instructions

These instructions are introduced by the peephole optimiser as compact equivalents for sequences of common instructions.

INDEXs = PUSH s / BINOP Times / BINOP PlusA where s can be W=4, S=2, C=1, D=8 LDLs n = LOCAL n / LOADs STLs n = LOCAL n / STOREs LDGs n = LDKW n / LOADs STGs n = LDKW n / STOREs LDIs = INDEXs / LOADs STIs = INDEXs / STOREs LDNW n = CONST n / LDIW STNW n = CONST n / STIW LDEW n = LDLW -4 / LDNW n STEW n = LDLW -4 / STNW n INCL n = LDLW n / INC / STLW n DECL n = LDLW n / DEC / STLW n JEQZ lab, etc. = CONST 0 / JEQ lab, etc. TESTGEQ lab = DUP 0 / JLT lab JTYPET m lab = TYPETEST m / JUMPT lab JTYPEF m lab = TYPETEST m / JUMPF lab

## Support instructions

The assembler and bytecode interpreter spend lots of opcodes on common instructions like "load local", so that a fair number of locals can be accessed with a fast, single-byte instruction. To free up opcodes for this, they implement other, less common instructions with an equivalent short sequence of others. These extra instructions are there to help with this process.

FCMP -- Compare floating point values and push comparison indicator DCMP -- Ditto for double-precision values

Because these instructions are never generated by the compiler, but only be the assembler as part of a canned sequence, it's OK for a JIT translator to handle them only in restricted circumstances; for example, `FCMP`

is always followed by an integer comparison with zero or a conditional jump.