Data representations for OBC

From Spivey's Corner
Jump to: navigation, search

Basic types

  • At runtime, characters, integers, floats and pointers use the same data representation (small or big-endian) as the host machine. In the bytecode file they are always stored in little-endian order (like the i386), and it's part of the job of the loader to swap the byte order if necessary as part of the relocation process.
  • Doubles have each of their two halves in native byte order, but the lower-order 32-bit word is always stored before the higher-order word, and the value is only 4-byte aligned, even if the host machine requires 8-byte alignment for loading and storing doubles. The bytecode interpreter is careful to load the two halves separately where necessary, either to correct the wrong order of the two halves, or if unaligned loads and stores are not supported by the host architecture. Yes, this can make double-precision code much slower than single-precision code. Happily, on the i386 none of this care is needed.


The first word of each heap-allocated record is the address of its descriptor. Pointers to records usually point to the second word, so that the descriptor address is found at byte offset -4 from the pointer.

The descriptor for a record type contains

  • an ancestor table that can be used to implement type tests efficiently
  • a garbage collector map, showing where in the record pointers to other heap-allocated structures may be stored
  • a method table, used to resolve dynamic references to the 'type-bound procedures' of Oberon-2.


A procedure is represented by the address of its constant pool, a table that contains several pieces of information about the procedure, including the native code that is run when the procedure is called. (For bytecode procedures, this native code will be the bytecode interpreter.) The layout of the constant pool is described on the page about the bytecode machine's calling convention.