Glossary (Compilers)

From Spivey's Corner
Jump to: navigation, search
Application Binary Interface (ABI)A set of conventions about the layout of memory and the use of registers that is required for machine code on a specific architecture that interacts with an operating system and existing libraries. For example, library functions may expect to be called using a specific instruction (saving the PC in a known place) and with arguments in specific registers. The ABI for a platform may be determined partly by the hardware (e.g., what the subroutine call instruction does) and partly by agreement among all software suppliers (e.g., which registers are preserved across subroutine calls).
A set of conventions about the layout of memory and the use of registers that is required for machine code on a specific architecture that interacts with an operating system and existing libraries. For example, library functions may expect to be called using a specific instruction (saving the PC in a known place) and with arguments in specific registers. The ABI(Application Binary Interface). A set of conventions about the use of registers and memory in calling subroutines, including which registers are used to pass parameters, and which must be saved and restored if they are used by a subroutine. By adhering to the same conventions, subroutines compiled from different languages can be made to operate together. for a platform may be determined partly by the hardware (e.g., what the subroutine call instruction does) and partly by agreement among all software suppliers (e.g., which registers are preserved across subroutine calls).
Addressing modeMachines commonly have one or more schemes that can be used in an instruction that accesses memory in order to calculate the address of the memory location to be accessed. Examples of such schemes or ''addressing modes'' are: a fixed address, or the sum of a register and a constant, or the sum of two registers. On a ''RISC machine'', there may be only one or two addressing modes, perhaps a register plus a ''small'' offset, or the sum of two registers. Commonly, it is not possible on such machines to fit a whole memory address into an instruction, so that access to ''statically allocated'' variables requires the address to be ''developed'' into a register first.
Machines commonly have one or more schemes that can be used in an instruction that accesses memory in order to calculate the address of the memory location to be accessed. Examples of such schemes or addressing modes are: a fixed address, or the sum of a register and a constant, or the sum of two registers. On a RISC machineA computer designed with a simplified instruction set. Typically, these machines have a large set of uniform registers, a small set of addressing modes, and load/store instructions separate from the instructions that carry out arithmetic operations., there may be only one or two addressing modes, perhaps a register plus a small offset, or the sum of two registers. Commonly, it is not possible on such machines to fit a whole memory address into an instruction, so that access to statically allocated variables requires the address to be developed into a register first.
Basic blockA sequence of operations one entry point at the start and one exit point at the end; if one operation in a basic block is executed, then all of them are executed. An ''extended basic block'' has one entry point at the start, but may have multiple exits, for example via conditional branches in the middle.
A sequence of operations one entry point at the start and one exit point at the end; if one operation in a basic block is executed, then all of them are executed. An extended basic block has one entry point at the start, but may have multiple exits, for example via conditional branches in the middle.
Big switchThe collection of action routines in a bytecode interpreter (usually expressed as a @switch@ statement in C, or something equivalent), with one action for each possible bytecode. The big switch and its enclosing fetch/execute loop make up much of the overhead of executing bytecode -- typically 80% of the execution time -- making it highly beneficial to choose bytecodes that perform several related actions rather than just one.
The collection of action routines in a bytecode interpreter (usually expressed as a switch statement in C, or something equivalent), with one action for each possible bytecode. The big switch and its enclosing fetch/execute loop make up much of the overhead of executing bytecode – typically 80% of the execution time – making it highly beneficial to choose bytecodes that perform several related actions rather than just one.
BytecodeCode for a ''virtual machine'' that is represented in a compact form, with most instructions occupying a single byte.
Code for a virtual machine that is represented in a compact form, with most instructions occupying a single byte.
Common sub-expression eliminationAn optimisation that identifies repeated operations (typically within a ''basic block'') and eliminates them by performing each operation once and saving the result in a ''temp''.
An optimisation that identifies repeated operations (typically within a basic block) and eliminates them by performing each operation once and saving the result in a temp.
Context-free grammarA formal definition of a ''language'', consisting of alphabets @@Σ@@ of tokens or terminal symbols and @@N@@ of variables or non-terminal symbols, a start symbol @@S ∈ N@@, and a set of productions @@A -> α@@ where @@A ∈ N@@ and @@α ∈ (N ∪ Σ)*@@. The language defined by a CFG is the set of strings in @@Σ*@@ that are generated by ''derivations'' in the grammar.
A formal definition of a language, consisting of alphabets Σ of tokens or terminal symbols and N of variables or non-terminal symbols, a start symbol SN, and a set of productions A α where AN and α ∈ (N ∪ Σ). The language defined by a CFG is the set of strings in Σ that are generated by derivations in the grammar.
Context pointerOn Keiko, procedures are represented by a small table that contains the size of the stack frame, a pointer to the ''bytecode'' for the procedure body, and the values of constants referenced by the bytecode, including the addresses of ''statically allocated'' variables and of other procedures that it calls. The context pointer is a register that always points to the table for the currently active procedure, and is used by ''bytecode'' instructions to fetch the values of those constants.
On Keiko, procedures are represented by a small table that contains the size of the stack frame, a pointer to the bytecode for the procedure body, and the values of constants referenced by the bytecode, including the addresses of statically allocated variables and of other procedures that it calls. The context pointer is a register that always points to the table for the currently active procedure, and is used by bytecode instructions to fetch the values of those constants.
Developing a constant into a registerRISC machines are commonly unable to fit an arbitrary word-sized constant into the operand field of an instruction. Programs that use such constants (especially for addressing) must be written to use special instructions to put the value of the constant into a register before it is used. On the ARM, there is a special form of ''PC-relative'' load instruction that is used for this purpose. Other machines (including later models of the ARM) can use a pair of instructions, each providing 16 bits of a 32-bit constant.
RISC machines are commonly unable to fit an arbitrary word-sized constant into the operand field of an instruction. Programs that use such constants (especially for addressing) must be written to use special instructions to put the value of the constant into a register before it is used. On the ARM, there is a special form of PC-relative load instruction that is used for this purpose. Other machines (including later models of the ARM) can use a pair of instructions, each providing 16 bits of a 32-bit constant.
Dynamic allocationStorage that is allocated as needed during execution of a program. Typically, the program will have pointer variables @@p@@, and an operation @@new(p)@@ that allocates a fresh block of storage and puts its address in @@p@@. Dynamically allocated storage can be recycled either by providing an explicit operation to free blocks of storage, or by using a ''garbage collector''.
Storage that is allocated as needed during execution of a program. Typically, the program will have pointer variables p, and an operation new(p) that allocates a fresh block of storage and puts its address in p. Dynamically allocated storage can be recycled either by providing an explicit operation to free blocks of storage, or by using a garbage collector.
Dynamic linkA pointer leading from the stack frame of a procedure to the stack frame of the procedure that called it. The dynamic link is used when the procedure returns to restore the frame pointer to the value it had before the call.
A pointer leading from the stack frame of a procedure to the stack frame of the procedure that called it. The dynamic link is used when the procedure returns to restore the frame pointer to the value it had before the call.
Floating pointA representation of real numbers with a sign, a mantissa, and an exponent that allows the number to be scaled by a power of two. Typical machines have special registers for holding floating point numbers and special instructions for loading, storing and comparing them and performing arithmetic operations on them. In this course, we ignore these machine features for simplicity because they add complexity without presenting any really new problems from the point of view of compiling.
A representation of real numbers with a sign, a mantissa, and an exponent that allows the number to be scaled by a power of two. Typical machines have special registers for holding floating point numbers and special instructions for loading, storing and comparing them and performing arithmetic operations on them. In this course, we ignore these machine features for simplicity because they add complexity without presenting any really new problems from the point of view of compiling.
Frame headthe portion of a ''stack frame'' that contains administrative information. On Keiko, the frame head contains the ''dynamic link'', the ''return address'', a ''context pointer'' for the current procedure, and the ''static link''.
the portion of a stack frame that contains administrative information. On Keiko, the frame head contains the dynamic link, the return address, a context pointer for the current procedure, and the static link.
Frame pointerA register that points to the ''stack frame'' of the currently active procedure.
A register that points to the stack frame of the currently active procedure.
Garbage collectorA software component that is able to analyse the pointers that are stored by a running program, in order to identify blocks of storage that have been ''dynamically allocated'' but are no longer accessible by any pointer. Such blocks can be recycled and used to satisfy future allocation requests. In order for a garbage collector to work safely and efficiently, it must be able to identify all pointers that are stored by the running program. Keiko has provision for pointer maps that provide this information for each ''stack frame'', for the program's ''statically allocated'' storage, and for each kind of dynamically allocated block. In the course, for simplicity we forgo garbage collection and omit these pointer maps from the compiler output.
A software component that is able to analyse the pointers that are stored by a running program, in order to identify blocks of storage that have been dynamically allocated but are no longer accessible by any pointer. Such blocks can be recycled and used to satisfy future allocation requests. In order for a garbage collector to work safely and efficiently, it must be able to identify all pointers that are stored by the running program. Keiko has provision for pointer maps that provide this information for each stack frame, for the program's statically allocated storage, and for each kind of dynamically allocated block. In the course, for simplicity we forgo garbage collection and omit these pointer maps from the compiler output.
InterpreterA program that describes the behaviour of other programs written in a fixed language, without translating those programs into another form. For example, an interpreter for ''bytecode'' will have an explicit loop that fetches instructions one at a time, and invokes one of a fixed list of actions (the ''big switch'') according to the instruction it finds.
A program that describes the behaviour of other programs written in a fixed language, without translating those programs into another form. For example, an interpreter for bytecode will have an explicit loop that fetches instructions one at a time, and invokes one of a fixed list of actions (the big switch) according to the instruction it finds.
JIT translatorA compiler that is embedded in the execution environment for a virtual machine, and translates code for the virtual machine into native machine code just before each procedure is called. After translation, the native code remains in memory and can be used if the same procedure is called again, but the code is thrown away when the program terminates. Either each procedure (or some other fragment of the program) is translated when it is called for the first time, or an interpreter and a translator co-exist, and procedures are translated if they have been called more than a few times.
A compiler that is embedded in the execution environment for a virtual machine, and translates code for the virtual machine into native machine code just before each procedure is called. After translation, the native code remains in memory and can be used if the same procedure is called again, but the code is thrown away when the program terminates. Either each procedure (or some other fragment of the program) is translated when it is called for the first time, or an interpreter and a translator co-exist, and procedures are translated if they have been called more than a few times.
LanguageIn the theory of syntax, a set of strings over a specified alphabet.
In the theory of syntax, a set of strings over a specified alphabet.
Operator tree (optree)A tree whose nodes are labelled with elementary operations, such as forming the address of a local variable, loading from an address, or adding two integers together. In the course, these operations will be expressed as Keiko instructions. Machine code can be generated by finding groups of nodes ('tiles') in an operator tree that can be covered by single instructions of the target machine.
A tree whose nodes are labelled with elementary operations, such as forming the address of a local variable, loading from an address, or adding two integers together. In the course, these operations will be expressed as Keiko instructions. Machine code can be generated by finding groups of nodes ('tiles') in an operator tree that can be covered by single instructions of the target machine.
PC-relative addressingAn addressing mode that involves adding a ''small'' constant to the value of the ''program counter'' in order to form an address. On the ARM, PC-relative addressing is used to access tables of large constants that are located at the end of the code for each procedure, giving access to the values of these constants without having to embed large constants in instructions.An addressing mode that involves adding a fixed offset to the value of the program counter in order to form an address. On ARM, a large constant that does not fit in the immediate field of an instruction can be loaded into a register using a PC-relative load instruction. The assembler generated such instructions, and automatically lays out a table of literal values, when a programmer uses the syntax <code>ldr r''n'', =''const''</code>.
An addressing mode that involves adding a small constant to the value of the program counterA register that contains the address of the next instruction to be executed. Because of pipelining, on ARM Cortex-M machines, reading the program counter yields a value that is 4 bytes greater than the address of the current instruction. in order to form an address. On the ARM, PC-relative addressing is used to access tables of large constants that are located at the end of the code for each procedure, giving access to the values of these constants without having to embed large constants in instructions.
Peephole optimiserA compiler phase that improves a program that is expressed as a sequence of instructions. The optimiser contains a collection of rules, each matching a short sequence of instructions and replacing them with a better sequence.
A compiler phase that improves a program that is expressed as a sequence of instructions. The optimiser contains a collection of rules, each matching a short sequence of instructions and replacing them with a better sequence.
PhaseA compiler module that transforms the program being compiled from one representation into a possibly different representation. In the compilers we design, phases mostly run sequentially, each making a separate pass over the program; but it is possible to combine multiple phases into a single pass, for example generating intermediate code at the same time as parsing an expression, rather than constructing an explicit abstract syntax tree and then walking over it.
A compiler module that transforms the program being compiled from one representation into a possibly different representation. In the compilers we design, phases mostly run sequentially, each making a separate pass over the program; but it is possible to combine multiple phases into a single pass, for example generating intermediate code at the same time as parsing an expression, rather than constructing an explicit abstract syntax tree and then walking over it.
ProductionA rule @@A -> &alpha;@@ in a ''context-free grammar'' that allows a syntactic variable @@A@@ to be replaced by a string @@&alpha;@@, a mixture of variables and tokens.
A rule A α in a context-free grammar that allows a syntactic variable A to be replaced by a string α, a mixture of variables and tokens.
Program counterA register that holds the address of the next instruction to be executed.
A register that holds the address of the next instruction to be executed.
Regular expressionAn expression that denotes a language, built up from literal characters using the operations of concatenation, alternation and Kleene closure.
An expression that denotes a language, built up from literal characters using the operations of concatenation, alternation and Kleene closure.
Return addressThe address of the next instruction after a call of a procedure. When the procedure returns, execution continues from this point.
The address of the next instruction after a call of a procedure. When the procedure returns, execution continues from this point.
RISC machine
A computer designed with a simplified instruction set. Typically, these machines have a large set of uniform registers, a small set of addressing modes, and load/store instructions separate from the instructions that carry out arithmetic operations.
Scratch registerA register that is set aside for temporary use, typically in a short sequence of instructions that translate a single operation. For example, on the ARM we set aside @IP@ as a scratch register, and we can branch to an arbitrary address using the sequence @set ip, #addr; br ip@.
A register that is set aside for temporary use, typically in a short sequence of instructions that translate a single operation. For example, on the ARM we set aside IP as a scratch register, and we can branch to an arbitrary address using the sequence set ip, #addr; br ip.
Semantic actionAn expression attached to a ''production'' in a ''context-free grammar'' that shows how to build an abstract syntax tree for the construct that is generated by the production. More generally, an expression that associates a value with the construct that is computed from the values of its parts.
An expression attached to a production in a context-free grammar that shows how to build an abstract syntax tree for the construct that is generated by the production. More generally, an expression that associates a value with the construct that is computed from the values of its parts.
Small constantA constant that fits into the 'immediate field' of an instruction. ''RISC machines'' are designed with limited sizes for the constants that appear in instructions, big enough to contain the majority of constants that occur in practical programs. A compiler for such a machine should generate instructions with embedded constants where possible, but must also have provision for dealing with constants that do not fit, perhaps by ''developing'' them into registers. On CISC machines, it is often the case that all constants are regarded as small, whatever their value.
A constant that fits into the 'immediate field' of an instruction. RISC machines are designed with limited sizes for the constants that appear in instructions, big enough to contain the majority of constants that occur in practical programs. A compiler for such a machine should generate instructions with embedded constants where possible, but must also have provision for dealing with constants that do not fit, perhaps by developing them into registers. On CISC machines, it is often the case that all constants are regarded as small, whatever their value.
Stack frameA region of storage, allocated from a stack, that contains the parameters and local variables of a procedure activation, together with administrative information (in the ''frame head'') used when the procedure returns, or for access to variables in enclosing procedures.
A region of storage, allocated from a stack, that contains the parameters and local variables of a procedure activation, together with administrative information (in the frame head) used when the procedure returns, or for access to variables in enclosing procedures.
Stack pointerA register that points to the last address on the stack that is in use (or alternatively, the first address that is not in use).
A register that points to the last address on the stack that is in use (or alternatively, the first address that is not in use).
Static allocationStorage reserved at a fixed address, supporting a single copy of a variable, with a lifetime covering the entire execution of the program.
Storage reserved at a fixed address, supporting a single copy of a variable, with a lifetime covering the entire execution of the program.
Static linkA pointer leading from the ''stack frame'' of a procedure to the stack frame of the textually enclosing procedure. Static links are one mechanism for allowing the body of a procedure to access variables in enclosing procedures.
A pointer leading from the stack frame of a procedure to the stack frame of the textually enclosing procedure. Static links are one mechanism for allowing the body of a procedure to access variables in enclosing procedures.
Syntax-directed translationA scheme for generating code that is based on assembling the translation of each construct from translations of its parts. Such a scheme is naturally expressed as a recursive function on ''abstract syntax trees''.
A scheme for generating code that is based on assembling the translation of each construct from translations of its parts. Such a scheme is naturally expressed as a recursive function on abstract syntax trees.
Virtual machineA notional computer that is used as the target of a compiling process but is not implemented as hardware. Code for a virtual machine may be executed either by an ''interpreter'', or by translating it into code for a real machine, perhaps just before it is executed (''JIT translation'').
A notional computer that is used as the target of a compiling process but is not implemented as hardware. Code for a virtual machine may be executed either by an interpreter, or by translating it into code for a real machine, perhaps just before it is executed (JIT translation).
Personal tools

Variants
Actions
Navigation
Tools