Bytecode instructions for OBC
- 1 For release 2.9
- 2 Older text
For release 2.9
This list contains the instructions that are generated by the compiler: first, a basic set that allows all necessary operations to be expressed, and contains more-or-less the instructions that are used by the compiler's code generator. Then we can add the packed instructions that are used by the compiler's peephole optimizer; each packed instruction corresponds to a short sequence of basic instructions that frequently occur together.
Addressing, load and Store
- Push the small constant n onto the evaluation stack. Like all constants that appear as operands of instructions, n must fit into 16 bits.
- The constant n can be a decimal or hexadecimal constant or an assembler symbol; its value is pushed on the stack. The assembler implements this instruction by placing n in the constant pool for the current procedure and generating an LDKW instruction to fetch it at runtime.
- Push a floating-point, double-precision or long-integer constant.
- Push the address of a local variable that is at offset n from the frame head.
- Pop an address and push the contents of the (8, 16, 32, 64)-bit location it names.
- Pop an address and a one- or two-word value and store the value into the (8, 16, 32, 64)-bit location named by the address.
- Pop an integer offset and an address and push their sum. This instruction behaves like integer addition, but is kept separate to help the JIT translator.
- Duplicate the one-word item that is
nwords from the top of the stack – so
DUP 0duplicates the top element on the stack.
- Swap two one-word items at the top of the stack.
none-word items from the stack.
All arithmetic instructions expect their operands on the evaluation stack and leave their result on the stack.
- Binary arithmetic operations on integers. (
MODare guaranteed to give mathematically correct results).
- Unary minus on integers
- Integer comparisons with a Boolean result
- Boolean operations. As in C, these treat any non-zero operand as true, and produce a result that is either 1 or 0.
- Bitwise boolean operations.
- Given a number between 0 and 31, this generates a word with the only corresponding bit set. The LSB is numbered 0 and the MSB is numbered 31.
- Shift instructions.
- Long integer (64 bit) instructions.
- Long integer (64 bit) comparisons.
- Floating point arithmetic.
- Floating-point comparisons with a Boolean result.
- Double-precision arithmetic.
- Double-precision comparisons with a Boolean result.
- Convert integer to floating point.
- Convert integer to double precision.
- Convert floating point to double precision.
- Convert double precision to floating point.
- Convert integer to unsigned character, masking off unwanted bits.
- Convert integer to signed short, sign extending the result from 16 to 32 bits.
- Convert integer to long integer
- Convert long integer to integer
- Convert long integer to double precision
Labels in the assembler are written as decimal integers.
- Jump to label
- Pop a Boolean value and jump to label
labif the Boolean is false or if it is true, respectively.
An assembler directive: attach the label
lab to the next instruction.
- Pop an array bound B and check the index i now at the top of the stack. If the test 0 ≤ i < B fails, report and array bound error on line n.
- Check a pointer p on top of the stack. If it is null, report a null pointer error on line n.
- Check a static link p on top of the stack. If it is not null, report the error that a local procedure is being assigned to a procedure variable.
- Check a divisor d on top of the stack. If it is zero, report the error of division by zero.
ERROR e n
- Report runtime error e on line n. The code e should be one of those listed in function
messagein the file
These instructions perform tasks that correspond to specific constructs in Oberon. They may or may not help with the implementation of other languages.
- At the top of the stack are the address of a record descriptor, and below it the address of a record. Pop these, test whether the decriptor is the ancestor of the record at level n, and push a Boolean result.
- The instruction
JCASE nmust be followed by
CASELinstructions. It pops an integer
ifrom the stack. If
0 <= i < n, the program continues at the corresponding label; otherwise, it continues with the instruction following the last following
- The instruction expects three integers
hion the stack. If
lo <= x <= hi, the program contiunes at label
lab, otherwise, it continues with the next instruction as usual.
- This instruction expects two address
band an integer
non the stack. It copies a block of
nbytes from address
- This instruction expects to find on the stack the address of an open array parameter and a size in bytes. It allocates space in the stack frame to copy the parameter, then overwrites the parameter with its new address.
PROC name nargs fsize gcmap,
- @PRIMDEF name prim nargs fsize gcmap
GLOBAL name size
For compactness, brevity and speed, the compiler has a peephole optimizer that replaces common sequences of instructions by single instructions that have the same effect. Some of these instructions are realised by the bytecode assembler and interpreter in a form that is more compact and faster than the original sequence.
- Load from local variable;
LDLs nis equivalent to
LOCAL n / LOADsW.
- Store to local variable;
STLs nis equivalent to
LOCAL n / STOREs.
- Load from global variable;
LDGs xis equivalent to
CONST x / LOADs. Here x is a symbol that refers to an address in global storage, and the instruction loads from that address.
- Store to global variable;
STGs xis equivalent to
CONST x / STOREs.
- Integer compare and branch;
JEQ labis equivalent to
EQ / JTRUE lab, etc.
- Integer compare with zero and branch;
JEQZ labis equivalent to
PUSH 0 / EQ / JTRUE lab, etc.
- Floating point compare and branch;
FJEQ labis equivalent to
FEQ / JTRUE lab, etc.
- Double-precision compare and branch;
DJEQ labis equivalent to
DEQ / JTRUE lab, etc.
- Long integer compare and branch;
QJEQ labis equivalent to
QEQ / JTRUE lab, etc.
- Increment and decrement;
INCis equivalent to
PUSH 1 / PLUS, etc.
- Increment or decrement local variable;
INCL nis equivalent to
LDLW n / INC / STLW n, etc.
- Long integer increment and decrement;
QINCis equivalent to
QCONST 1 / QPLUS, etc.
- Bitwise difference, equivalent to
BITNOT / BITAND
- Scaled addition;
INDEXsis equivalent to
PUSH n / TIMES / PLUSAwhere n is 1, 2, 4 or 8.
- Indexed load;
LDIsis equivalent to
INDEXs / LOADs.
- Constant indexed load and store;
LDNW nis equivalent to
PUSH n / LDIW, etc.
- Load and store in enclosing frame;
LDEW nis equivalent to
LDLW -4 / LDNW n, and
LDLW -4 / STNW n. These instructions perform the operations needed to load and store one-word quantities in the stack frame of the statically enclosing procedure by following one link in the static chain.
- The sequence
DUP 0 / CONST n / JGEQ labcan be abbreviated as
CONST n / TESTGEQ lab. This is used in translating CASE statements.
==Procedure call and return
These instructions are not used directly by the compiler, but are introduced by the assembler in the code it produces for other instructions.
Various parts of the system deal with bytecode instructions:
- The back end of the compiler.
- The assembler/linker.
- The bytecode interpreter.
- JIT translators.
However, the sets of instructions that can appear at various stages vary a bit. The compiler generates code using instructions from a basic set, but there's a peephole optimiser that packs together common sequences into single instructions. For example, the code generator might translate the expression x+1 into (its internal representation of)
LOCAL 12 / LOADW / CONST 1 / PLUS, and the peephole optimiser will compress this into
LDLW 12 / INC. The peephole optimiser also introduces out-of-line constants for values that will not fit in the 16 bits that is the largest operand supported by the bytecode.
Next in the tool-chain comes the assembler/linker, which makes two significant changes: first, it may have multiple encodings for each instruction as opcodes; and second, some less common instructions generated by the compiler have no encoding, but are expanded by the assembler into equivalent short sequences of instructions. This releases more of the 256 opcodes for encoding more common instructions. For an example of the first kind of change, there are single-byte instructions that push small constants (perhaps 0 to 6) onto the stack, then there is a two-byte sequence that can push any value between -128 and 127, and there is a three-byte sequence that can push values between -32768 and 32767. Any larger values will have been replaced by out-of-line constants by the compiler.
An example of the second kind of change is provided by floating point branches. In place of (for example) the instruction
FJLT lab (which pops two floats x and y and branches to
x<y), the assembler generates the sequence
FCMP / JLTZ lab, first comparing the two floats and pushing a comparison indicator -1, 0, or +1, then popping this and branching to
lab if it is negative.
A sensible approach in a JIT translator is first to decode the opcodes into instructions, then to reverse the packing process that was done by the peephole optimiser, getting code that is expressed in terms of a small number of basic operations. These can then be used as the basis of instruction selection in a way that is more general than can be acheived by fixed translation of each instruction that is present in the raw bytecode.
These are the instructions as generated by the compiler before the peephole optimiser does its packing. They are also a reasonable set to implement directly in a JIT translator, after expanding packed instructions into their constituent parts.
CONST n -- Push constant LDKW n -- Push word from constant pool LOCAL n -- Push address of local PLUSA -- Add address and offset LOADs -- Pop address and push contents (s = W, S, C, D) STOREs -- Pop address then value, and store DUP n SWAP POP n PLUS, MINUS, TIMES, DIV, MOD -- Integer arithmetic UMINUS -- Integer unary minus AND, OR, NOT -- Boolean operators BITAND, BITOR, BITXOR, BITSUB, BITNOT -- Bitwise operators BIT -- Pop n, push (1 << n). [Note that BIT(32) = 0] LSL, LSR -- Logical shifts ASR -- Arithmetic shift right EQ, LT, GT, LEQ, GEQ, NEQ -- Integer comparisons JUMPF, JUMPT -- Pop boolean and jump if true or false JUMP -- Unconditional jump JCASE, JRANGE -- Special jumps for CASE statements FPLUS, FMINUS, FTIMES, FDIV -- Floating-point arithmetic FUMINUS -- Floating point unary minus FEQ, FLT, FGT, FLEQ, FGEQ, FNEQ -- Floating-point comparisons DPLUS, DMINUS, DTIMES, DDIV -- Double-precision arithmetic DUMINUS -- Double-precision unary minus DEQ, DLT, DGT, DLEQ, DGEQ, DNEQ -- Double-precision comparisons CONVNF, CONVFN -- Convert between integer and floating point CONVND, CONVDN -- Convert between integer and double precision CONVFD, CONVDF -- Convert between float and double CONVNC -- Convert integer to character CONVNS -- Convert integer to short BOUND -- Array bound check NCHECK -- Null pointer check GCHECK -- Check for global procedure ZCHECK -- Zero divisor check FZCHECK -- Floating-point zero divisor check DZCHECK -- Double-precision zero divisor check ERROR -- Runtime error ALIGNC, ALIGNS -- Align argument in 32-bit word FIXCOPY -- Copy fixed-size aggregate FLEXCOPY -- Allocate and copy open array parameter TYPETEST -- Class membership test CALL, CALLW, CALLD -- Procedure call TCALL, TCALLW, TCALLD -- Tail call LINK -- Pass static link SAVELINK -- Save static link in frame RETURN, RETURNW, RETURND -- Return from procedure LNUM -- Line number for debugging and profiling
These instructions are introduced by the peephole optimiser as compact equivalents for sequences of common instructions.
INDEXs = PUSH s / BINOP Times / BINOP PlusA where s can be W=4, S=2, C=1, D=8 LDLs n = LOCAL n / LOADs STLs n = LOCAL n / STOREs LDGs n = LDKW n / LOADs STGs n = LDKW n / STOREs LDIs = INDEXs / LOADs STIs = INDEXs / STOREs LDNW n = CONST n / LDIW STNW n = CONST n / STIW LDEW n = LDLW -4 / LDNW n STEW n = LDLW -4 / STNW n INCL n = LDLW n / INC / STLW n DECL n = LDLW n / DEC / STLW n JEQZ lab, etc. = CONST 0 / JEQ lab, etc. TESTGEQ lab = DUP 0 / JLT lab JTYPET m lab = TYPETEST m / JUMPT lab JTYPEF m lab = TYPETEST m / JUMPF lab
The assembler and bytecode interpreter spend lots of opcodes on common instructions like "load local", so that a fair number of locals can be accessed with a fast, single-byte instruction. To free up opcodes for this, they implement other, less common instructions with an equivalent short sequence of others. These extra instructions are there to help with this process.
FCMP -- Compare floating point values and push comparison indicator DCMP -- Ditto for double-precision values
Because these instructions are never generated by the compiler, but only be the assembler as part of a canned sequence, it's OK for a JIT translator to handle them only in restricted circumstances; for example,
FCMP is always followed by an integer comparison with zero or a conditional jump.