Instruction set for OBC

Copyright © 2024 J. M. Spivey
Jump to navigation Jump to search

\newdimen\instmargin \instmargin=0.5\hsize

\def\instdef#1{\par\resetpar

 \setbox0=\hbox{\hbox to\blockindent{\hfil$\triangleright$\hfil}#1\ \ }%
 \hangindent=\instmargin\noindent
 \ifdim\wd0>\instmargin \box0\hfil\break \else \wd0=\instmargin\box0\fi}

\def\inst#1#2{\instdef{\sf #1}#2\par} \def\insteq#1#2#3{%

 \instdef{\hbox to 0.16\hsize{\sf #1\ \ \hfil}= {\sf #2}}#3\par}

\long\def\defn{\par\begingroup \parindent=0pt \leftskip=\blockindent

   \def\par{\endgraf\endgroup} \nobreak\indent}

It is now time to get down to details and describe the instructions provided by the Keiko machine. Some instructions exist in several variants that operate on different sizes or types of data. For example, there are four load instructions \verb/LOADC/, \verb/LOADS/, \verb/LOADW/ and \verb/LOADD/ that load quantities from memory with sizes 1, 2, 4 and 8 bytes respectively. In the listing below, these are all represented by the one line, \medskip \inst{LOAD$s$}{Load from memory} \medskip \noindent Wherever a lower-case $s$ is used in an instruction name, this should be understood as standing for one of the four letters \verb/C/, \verb/S/, \verb/W/, \verb/D/. Which letters are allowed in a particular instruction will be apparent from the context.

Similarly, all floating-point operations exist in two variants for single and double precision; for example, there are two instructions \verb/FJEQ/ and \verb/DJEQ/ that each compare two values and jump if they are equal. Both of these instructions might be described by the single line \medskip \inst{$t$JEQ $\ell$}{Jump if equal} \medskip \noindent where $t$ stands for \verb/F/ or~\verb/D/.

Literal constants

The constant pool of a procedure contains integers, addresses and floating-point constants that each occupy one word, and double-precision constants that occupy two words. There are two instructions \verb/CONSTW/ and \verb/CONSTD/ for pushing these values onto the evaluation stack.

\inst{CONST$s$ $n$}{Push item from constant pool} \defn Pushes a constant of size $s$ that is stored at offset $4*n+20$ from

 $\it cp$. (The procedure descriptor starts with 5 words of header
 information before the first constant.)

\noindent For small integer constants that fit in a 16-bit signed representation, it's not necessary to use the constant pool. \inst{PUSH $n$}{Push constant $n$}

Address arithmetic

\inst{PLUSA}{Add offset to address} \defn Pops an address and an integer offset, adds them together, and

 pushes the result. (Actually, this instruction is the same as
 \verb/PLUS/)

\insteq{INDEX$s$}{PUSH $s$/TIMES/PLUSA}{Add index} \defn This has an effect similar to \verb/PLUSA/, execept that the

 offset is multiplied by $s$ before adding it.

Odds and ends

\inst{DUP $n$}{Duplicate $n$'th item from stack top} \inst{SWAP}{Swap the top two items on the stack} \inst{POP$s$}{Pop an item of size $s$ from the stack} \medskip \inst{LINE $n$}{Mark line number $n$} \defn This instruction does nothing, except in the profiler, when it increments the execution count for line~$n$.

Loads and stores

\inst{LOAD$s$}{Load from memory} \defn Pops an address, which must be aligned appropriately for the

 size $s$, and pushes the contents of that address.  In the case of
 \verb/LOADS/, the result is sign-extended from 16 to 32 bits.

\inst{STORE$s$}{Store to memory} \defn Pops a value and an appropriately aligned address, and stores

 the value in the address.  If the size $s$ is smaller than one word,
 then just the low-order bits are stored.

\inst{FIXCOPY}{Copy fixed-size aggregate} \defn Pops three values, an integer $n$ and two pointers |src| and |dst|, and copies $n$ bytes from |src| to |dst|.n

Extended instructions: although the {\vsf LOAD$s$} and {\vsf STORE$s$} instructions are sufficient for all memory transfers, they commonly occur in one of a number of patterns. For example, the sequence {\vsf LOCAL $n$/STOREW} is commonly used to store to a local variable or parameter, and the sequence {\vsf CONST $x$/LOADW} is commonly used to load the value of a global variable. So it is useful and efficient to provide the following abbreviated forms: \insteq{LDL$s$ $n$}{LOCAL $n$/LOAD$s$}{Load local} \insteq{STL$s$ $n$}{LOCAL $n$/STORE$s$}{Store local} \insteq{LDG$s$ $x$}{CONST $x$/LOAD$s$}{Load global} \insteq{STG$s$ $x$}{CONST $x$/STORE$s$}{Store global} \insteq{LDNW $n$}{PUSH $n$/PLUSA/LOADW}{Load field} \insteq{STNW $n$}{PUSH $n$/PLUSA/STOREW}{Store field} \insteq{LDI$s$}{INDEX$s$/LOAD$s$}{Load indexed} \insteq{STI$s$}{INDEX$s$/STORE$s$}{Store indexed} \medskip \insteq{LDEW $n$}{LDLW 12/LDNW $n$}{Load word in enclosing frame} \insteq{STEW $n$}{LDLW 12/STNW $n$}{Store word in enclosing frame} \defn These instructions abbreviate the common task of accessing a variable in the stack frame of the first lexically enclosing procedure.

Integer arithmetic

There is a long list of unary and binary operations that operate on one or two integers and yield an integer result. Each of these instructions expects its inputs on the stack and replaces them with its result. In the case of the binary operations, the left-hand operand should be pushed first, followed by the right-hand operand – so the instruction expects it right-hand operand on top of the stack.

\bigskip\noindent {\it Binary operations}. \ Each of these instructions pops two integer values $b$ and $a$ from the stack, and pushes the result $a \oplus b$, where $\oplus$ is a binary integer operation. \medskip \inst{PLUS}{Addition} \inst{MINUS}{Subtraction} \inst{TIMES}{Multiplication} \inst{DIV}{Division} \inst{MOD}{Modulo} \defn Both \verb/DIV/ and \verb/MOD/ have the same mathematically-correct meaning they are given in Oberon. If $q = n \div d$ and $r = n \bmod d$ and $d \neq 0$ then it is always true that $n = q \times d + r$. If $d > 0$ then $0 \leq r < d$, and $(-n) \div (-d) = n \div d$. \medskip \inst{AND}{Boolean and} \inst{OR}{Boolean or} \defn These Boolean operations expect two operands that are either 0 (for false) or 1 (for true) and deliver a similar result. The comparison operations also deliver such a Boolean result. \medskip \inst{EQ}{Equality comparison} \inst{LT}{Less-than comparison} \inst{GT}{Greater-than comparison} \inst{LEQ}{Less-than-or-equal comparison} \inst{GEQ}{Greater-than-or-equal comparison} \inst{NEQ}{Not-equal comparison} \medskip \inst{ASH}{Arithmetic shift} \inst{LSH}{Logical left shift} \inst{RSH}{Logical right shift} \medskip \inst{BITAND}{Bitwise and} \inst{BITOR}{Bitwise or} \inst{BITXOR}{Bitwise exclusive-or} \inst{BITSUB}{Bitwise subtraction} \defn These bitwise operations combine two 32-bit quantities bit-by-bit. Bitwise subtraction is defined by $a \ominus b = a \wedge \lnot b$. \bigskip \noindent {\it Unary operations}. \ These instructions pop a single operand $a$ and push the result $\sim\,a$, where $\sim$ is a unary integer operation. \medskip \inst{UMINUS}{Unary minus} \inst{NOT}{Boolean not} \inst{BITNOT}{Bitwise not} \medskip\noindent Two unary operations in the extended instruction set abbreviate the operations of adding or subtracting 1: \insteq{INC}{PUSH 1/PLUS}{Increment} \insteq{DEC}{PUSH 1/MINUS}{Decrement}

Floating point arithmetic

Each floating-point operation exists in two variations, one for single-precision values ($t = {\sf F}, FloatT$) and one for double-precision values ($t = {\sf D}, DoubleT$). Double-precision values occupy two cells on the stack; these are always in little-endian order, even if the host machine for the bytecode interpreter is big-endian. The two words of a double-precision value are always loaded and stored individually, so that alignment issues do not arise. \bigskip\noindent {\it Binary operations}. \ Each of these operations pops two values of the same type, and pushes either a result of the same type, or a Boolean value. \inst{$t$PLUS}{Floating-point addition} \inst{$t$MINUS}{Floating-point subtraction} \inst{$t$TIMES}{Floating-point multiplication} \inst{$t$DIV}{Floating-point division} \inst{$t$EQ}{Floating-point equality} \inst{$t$LT}{Floating-point less than} \inst{$t$GT}{Floating-point greater than} \inst{$t$LEQ}{Floating-point less than or equal} \inst{$t$GEQ}{Floating-point greater than or equal} \inst{$t$NEQ}{Floating-point inequality} \bigskip\noindent {\it Unary operations}. \inst{$t$UMINUS}{Floating-point negation}

Branches

The core instruction set contains three simple branch instructions: \medskip \inst{JUMP $\ell$}{Jump to $\ell$} \inst{JUMPT $\ell$}{Jump on true to $\ell$} \inst{JUMPF $\ell$}{Jump in false to $\ell$} \defn All three of these are written with an integer label, which is converted into a displacement by the assembler. The two conditional branches pop a Boolean value off the stack, and branch if this value is true or false respectively. \medskip \noindent Instructions in the extended set cater for the common case of branching on a comparison: here $t$ stands for the empty string (meaning and integer comparison), or for \verb/F/ or \verb/D/ for single or double precision floating point. \medskip \insteq{$t$JEQ $\ell$}{$t$EQ/JUMPT $\ell$}{Jump if equal} \insteq{$t$JLT $\ell$}{$t$LT/JUMPT $\ell$}{Jump if less than} \insteq{$t$JGT $\ell$}{$t$GT/JUMPT $\ell$}{Jump if greater than} \insteq{$t$JLEQ $\ell$}{$t$LEQ/JUMPT $\ell$}{Jump if less than or equal} \insteq{$t$JGEQ $\ell$}{$t$GEQ/JUMPT $\ell$}{Jump if greater than or equal} \insteq{$t$JNEQ $\ell$}{$t$NEQ/JUMPT $\ell$}{Jump if not equal} \defn These instructions pop two values of appropriate type from the stack, compare them, and branch conditionally on the result. \medskip\noindent Additional instructions abbreviate integer comparisons with zero. \medskip \insteq{JEQZ $\ell$}{PUSH 0/JEQ $\ell$}{Jump if equal to zero} \insteq{JLTZ $\ell$}{PUSH 0/JLT $\ell$}{Jump if less than zero} \insteq{JGTZ $\ell$}{PUSH 0/JGT $\ell$}{Jump if greater than zero} \insteq{JLEQZ $\ell$}{PUSH 0/JLEQ $\ell$}{Jump if less than or equal to zero} \insteq{JGEQZ $\ell$}{PUSH 0/JGEQ $\ell$}{Jump if greater than or equal to zero} \insteq{JNEQZ $\ell$}{PUSH 0/JNEQ $\ell$}{Jump if not equal to zero} \defn These instructions pop one integer value and compare it with zero, branching conditionally on the result.

Some further instructions are used in the translation of \verb/CASE/ statements; the compiler generates a balanced binary tree of tests, where the internal nodes are binary comparisons implemented with \verb/TESTGEQ/ instructions, and the leaves are made up of equality comparisons (\verb/JEQ/), range tests (\verb/JRANGE/) and jump tables (\verb/JCASE/). \medskip \insteq{TESTGEQ $\ell$}{DUP 1/JLT $\ell$}{Case split} \defn Expects two items on the stack: the value~$v$ of a case expression and a case label~$a$. It compares the two, popping $a$ but leaving $v$ on the stack, and branches to $\ell$ if $v \geq a$. \medskip \inst{JRANGE $\ell$}{Range test} \defn Pops three values $v$, $a$, $b$ from the stack where $a \leq b$, and jumps to $\ell$ if $a \leq v \leq b$. \medskip \inst{JCASE $n$ $\ell'$}{Case jump} \inst{CASEL $\ell_i$}{Case label} \defn A \verb/JCASE/ instruction must be followed by $n$ instructions {\vsf CASEL $\ell_0$}, \dots, {\vsf CASEL $\ell_{n-1}$} that specify $n$ labels. It pops a value $v$ from the stack; if $v < 0$ or $v \geq n$ then it branches to $\ell'$, otherwise it branches to~$\ell_i$.

Procedure call and return

These instructions implement the protocol for procedure call and return explained in Section~\ref{s:call}. The calling sequence is the same whether the procedure being called is implemented in bytecode or as a native-code primitive.

\inst{CALL$s$ n}{Call procedure with $n$ arguments} \defn The evaluation stack should contain |n+1| values: on top is the descriptor address of a procedure, and below it are $n$~words of arguments. The procedure returns a result of size $s$, where $s = 0$, $4$ or~$8$.

\medskip \inst{RETURN}{Return from procedure} \defn The current procedure returns to its caller, leaving any result on top of the stack.

\medskip \noindent Because all arguments are enlarged to one or two words, if the host machine (like the Sun {\sc Sparc}) is big-endian, then a problem arises with {\sci char\/} and {\sci short} parameters, because they are pushed as words and accessed as a smaller type. To make up for this, there are two instructions that adjust the alignment of items on the evaluation stack appropriately: \medskip \inst{ALIGNC}{Align {\sci char} parameter} \inst{ALIGNS}{Align {\sci short} parameter} \medskip \noindent Bytecode is portable, so these instructions exist and are generated by the compiler even if the host machine is little-endian; but in that case they do nothing.

\medskip \noindent Oberon provides array and record parameters passed by value, so painful though it may be, there must be provision for copying these values on procedure call. The actual parameter is passed as an address, and the procedure prologue is responsible for making the copy, using one of the following two instructions:% \footnote{This leaves open the possibility that the Oberon compiler will be smart enough to detect when the copying is not actually needed; unfortunately, this is difficult because of aliasing.} \medskip \medskip \inst{FLEXCOPY}{Copy open array parameter} \defn This instruction pops two values, an integer $k$ and an address~$a$. The address $a$ must point to two words that contains the base address and upper bound of an open array parameter, and $k$ must be the size in bytes of each element of the open array. Space for a copy of the array is allocated by decreasing the stack pointer; the array elements are copied into this space, and the address of the newly-allocated space is stored at~$a$, overwriting the previous pointer stored there. \medskip In addition to its use for copying aggregate value parameters, \verb/FIXCOPY/ is also used to implement array and record assignment.

Runtime checks

These instructions do nothing (except popping some of their arguments) under ordinary circumstances, but in case of error, they print an appropriate message and halt execution. \medskip \inst{BOUND $n$}{Bound check} \defn Pops a value $b \geq 0$ from the stack and examines another value $a$, leaving it on top of the stack. If $0 \leq a < b$, then the instruction does nothing further; otherwise, it signals an array bound error on line~$n$. \medskip \inst{NCHECK $n$}{Null-pointer check} \defn Examines a pointer value $a$, leaving it on top of the stack. If $a$ is a null pointer, then an error is signalled on line $n$; otherwise, the instruction does nothing. \medskip \inst{$t$ZCHECK n}{Divide-by-zero check} \defn $t$ may be the empty string (meaning an integer operation), or \verb/F/ or \verb/D/ for single or double precision floating point. Examines a numeric value $a$, leaving it on top of the stack. If $a$ is non-zero, the instruction does nothing; otherwise, a divide-by-zero error is signalled in line $n$. \medskip \inst{ERROR $e$ $n$}{Signal error number $e$ on line $n$} \defn This instruction always signals an error on line $n$. The error message is selected from a fixed table by the integer index $e$.

Type conversion

The Keiko machine supports three representations for integers: an unsigned 8-bit representation, and signed representations in 16 and 32 bits. However, only the 32-bit representation is used for integers on the expression stack. There are also single and double precision floating-point numbers. The machine provides instructions to convert between these numeric types: each one pops a value of one type from the stack, and pushes the same value after converting it to another type. Some of the conversions between these numeric types are not needed, because they would do nothing: for example, there is no instruction to convert from 16-bit to 32-bit integers, because both have the same representation on the stack. It's also not necessary to convert (say) an integer value to a character before storing it in a character variable, because the \verb/STOREC/ instruction selects the low-order 8 bits of the value anyway. \medskip \inst{CONVFN}{Convert float to integer} \inst{CONVDN}{Convert double to integer} \defn These conversion truncate the floating-point value $x$ to the

 next smaller integer $\lfloor x\rfloor$, inheriting the meaning of
 the C expression \verb/(int) x/.

\inst{CONVNF}{Convert integer to float} \inst{CONVND}{Convert integer to double} \inst{CONVFD}{Convert float to double} \inst{CONVDF}{Convert double to float} \defn These conversions are straightforward, and correspond to casts

 in C.  The effect in case of overflow is whatever is inherited from C.

\inst{CONVNC}{Convert integer to (unsigned) char} \inst{CONVNS}{Convert integer to (signed) short} \defn These conversions mask out the top 24 or 16 bits of the integer

 value $x$; in the case of \verb/CONVNS/, the result is then
 sign-extended back to 32 bits.

\medskip\noindent Other conversions, such as that from float to char, can be obtained by composing the ones listed here.