[Template fetch failed for http://spivey.oriel.ox.ac.uk/corner/Template:Sitenotice?action=render: HTTP 404]

Frequently asked questions

From Compilers
Revision as of 21:27, 17 March 2020 by Mike (talk | contribs) (Created page with "{{Compilers}} ''If you have a question, perhaps it is answered below -- or maybe you can find help in the growing glossary. Feel free to add headword...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

If you have a question, perhaps it is answered below – or maybe you can find help in the growing glossary. Feel free to add headwords to the glossary so Mike can fill in the definitions, or to add questions below.

New questions for 2018

I see that labels in the intermediate code generator are allocated by calling a function label ( ), and that uses a global, assignable variable to allocate a different label each time. Couldn't we do it a bit more functionally?

Yes, we could. We could follow the Haskell style, and turn the global variable into an instance of the state monad: but that wouldn't really gain us much, just force us to write the code generator in an irritatingly cramped way, with a global variable still hidden behind the scenes. A better solution would be to delay allocation of labels until the code tree is flattened, by introducing a new constructor

type code = ... | GENLAB of (codelab -> code)

Then we can write gen_stmt like this:

let rec gen_stmt = function ...
  | IfStmt (test, thenpt, elsept) ->
      GENLAB (fun lab1 -> GENLAB (fun lab2 -> GENLAB (fun lab3 ->
        SEQ [gen_cond test lab1 lab2; 
          LABEL lab1; gen_stmt thenpt; JUMP lab3;
          LABEL lab2; gen_stmt elsept; LABEL lab3])))

We then need to enhance the function canon that flattens code trees, so that given a tree GENLAB f it generates a new label and passes it to the function f, recursively flattening the tree that results. Because labels are generated in just one place, there's no longer a need for a global variable. To compile case statements and similar things, it's helpful to define a function genlabs : int → (codelab listcode) → code such that

genlabs n f = 
  GENLAB (fun x1 -> 
    GENLAB (fun x2 -> ... 
        GENLAB (fun xn -> f [x1; x2; ...; xn])...))

That is left as an exercise for the reader.

New questions for 2017

In problem 5.8, what is meant by Figure 14.2?

It ought to say Figure 8.5. Sorry about that. Ars longa, vita brevis, or, We have these treasures in jars of clay, according to your inclination.

Can the whole optree <LOADW, <OFFSET, <GLOBAL _a>, <LSL, reg, <CONST 2>>>> be evaluated in a single instruction on the x86?

Yes it can: the x86 has an addressing mode that adds together a 32-bit constant with a scaled register. The syntax is

movl _a(,%eax,4), %ebx

on Linux; goodness knows what the Intel syntax would be. The machine code is seven bytes, consisting of the opcode byte, a ModR/M byte, a SIB byte and a four-byte constant. A machine grammar for x86 would probably have nonterminals addr for addresses and offset for the offset part, with productions

reg --> <LOADW, addr>                  { mov addr, reg }
addr --> <OFFSET, <GLOBAL x>, offset>  { x(,offset) }
addr --> <OFFSET, reg1, offset>        { (reg1,offset) }
offset -> <LSL, reg1, <CONST n>>       { reg1,2^n } 

Better still, because the x86 can combine memory references with arithmetic, we can cover a tree such as

<PLUS, reg1, 
  <LOADW, 
    <OFFSET, <GLOBAL _a>, 
      <LSL, reg, <CONST 2>>>>>

in a single instruction, provided the register allocator can assign the same register to the first input of the addition and the result. This would be covered by rules

rand --> <LOADW, addr>                 { addr }
reg --> <PLUS, reg1, rand>             { ?mov reg1, reg; addl rand, reg } 

Experimenting with GCC on Linux is complicated slightly by the fact that (on some GCC versions) position-independent code is enabled by default. Use the -fno-pie flag to disable this. On amd64, the reference to the global a uses PC-relative addressing and adds an instruction, and there is a separate instruction to sign-extend the index from 32 to 64 bits.

The output of using ppx -d -d a.out contains references to instructions JPROC, SLIDEW and LDKW that don't appear to be explained anywhere. What do they do?

None of these instructions are generated by our compilers, but they are introduced by the Keiko assembler pplink as a low-level representation of the action of other instructions.

The PCALLW n generated by our compilers is actually equivalent to the Keiko instruction CALLW (n+1), because the static link is effectively an extra parameter counted in the n+1. (Oberon uses a different mechanism for static links that passes them in a (virtual) register, and allows them to be elided for global procedures, and there are LINK and SAVELINK instuctions – unused by us – to support that.)

CALLW n is in turn an abbreviation for the sequence JPROC; SLIDEW n, with JPROC being the operation that activates the procedure, and SLIDEW n being the operation that removes the stack frame, containing n parameters, and pushes one (W)ord of result. SLIDE is the same, but with no result. The return address of the procedure activation points at the SLIDE or SLIDEW instruction.

CONST n (where n is a large integer) and GLOBAL x (where x is a symbol) assemble into an entry into the constant pool for the current procedure (to which the cp register invariably points), together with an LDKW k instruction, with k being the offset of the constant pool entry. CONST n for small n assembles into a PUSH n instruction that has the constant inline; there are different instruction encodings for n in the range [-1..6], and for one and two byte signed integers. The constant pool is also referenced by the LDGW x and STGW x instructions, where the assembler replaces a symbol x by a constant pool offset.

Where does the definition of the type of tokens come from?

Near the top of the file parser.mly – the input to ocamlyacc – appears a sequence of token definitions, like this:

%token<string> IDENT
%token COMMA
%token BEGIN END

The syntax of these definitions is determined by tradition rather than design. Ocamlyacc uses the definitons to find out what tokens can occur in the grammar, and it also outputs them in the form of an OCaml type definition in the generated file parser.mli, something like

type token =
    IDENT of string
  | ...
  | COMMA
  | ...
  | BEGIN
  | END
  | ...

so that they can be referred to in code for the lexer.

In the peephole optimiser, is it possible for a code sequence to yield two different results?

It's possible, and if it happens we would add additional optimiser rules to resolve the problem. The example I gave in lectures concerned code that might be generated for an array reference a[3] where a is an integer array local to a subroutine. Initial code for this reference might be the sequence

LOCAL -40
CONST 12
OFFSET
LOADW

where the array is allocated at offset -40 in the stack frame, and the constant 12 represents the distance (three 4-byte words) between the start of the array and the address where a[3] is stored. (We will shortly be adding these instructions to our repertoire in order to compile subroutines with local variables (LOCAL) and access to elements of arrays (OFFSET)). In addition to tidying up jumps and labels, the peephole optimiser can also be used to simplify calculations involving constants, and to introduce abbreviated instructions for common combinations. Thus there might be a rule

LOCAL n; CONST k; OFFSET --> LOCAL n+k

that simplifies a constant offset from a known location in the stack frame, and there might be a rule

LOCAL n; LOADW --> LDLW n

that introduces the LDLW instruction as an abbreviation. With these rules, the given sequence simplifies to the single instruction LDLW -28.

But adding an offset and loading from the resulting address is also a common operation, and Keiko has an instruction LDNW that abbreviates this:

CONST k; OFFSET; LOADW --> LDNW k

Applying this rule simplifies the given sequence to the two instructions LOCAL -40; LDNW 12, and the optimiser might get stuck there if it chose to begin with this third rule rather that the ones stated earlier. The situation can be resolved by explicitly adding the additional rule,

LOCAL n; LDNW k --> LDLW n+k,

even though it is in some sense a consequence of the other rules.

The whole problem is similar to one that arises in automated theorem proving, where the slang term is 'hacking around Knuth--Bendix'. The name refers to a classic paper by Donald Knuth and Peter Bendix entitled 'Simple word problems in universal algebras' that is probably easiest to find in Knuth's volume, Selected Papers on Design of Algorithms.

The syntax summary for Lab 1 in Figure 3.1 of the coursebook contains these productions. What do they mean?

stmts --> stmt { ";" stmt }
stmt --> IF expr THEN stmts [ ELSE stmts ] END

These abbreviations were introduced by Niklaus Wirth in a grammar notation he called Extended BNF.[1] The notation { stuff } stands for zero or more repetitions of stuff, and [ stuff ] stands for an optional occurrence of stuff. So the same syntax could be described by the productions

stmts --> stmt
stmts --> stmt ";" stmts
stmt --> IF expr THEN stmts elsepart END
elsepart --> empty
elsepart --> ELSE stmts

These abbreviations make it a bit more convenient to describe some languages, but don't allow us to describe anything that couldn't (as in this example) also be described by an ordinary context free grammar. Sadly, yacc does not support these notations, partly because it's a bit difficult to work out how to specify in a general way what the corresponding AST would be. [Wirth also introduced fixed conventions for distinguishing between terminals and non-terminals in the grammar; I have tended to use font changes for that purpose, and to omit the commas that Wirth inserts between adjacent symbols.]

I notice you use your own Mercurial server and not BitBucketGitHub. Is there any reason for that?

Well, first, because BitBucket dropped support for Mercurial some time in 2020. But otherwise, ...

Not really, except that using GitHub might make it a bit more tempting for people to publish their own working repositories in the same place. I could disable forking, but that wouldn't stop people uploading their own repos independently, and perhaps making them public. Doing that for the lab exercises would be a shame, because it would threaten to spoil the learning experience of other current and future students taking the course. Doing so for the Christmas exercise would verge on a disciplinary offence, and I want to save people from the possibility of committing a disciplinary offence with a slip of the finger.

The translation from nondeterministic finite automata to deterministic ones may create an exponential blow-up in the number of states. Does this create a problem in practice?

Very rarely, because the regular expressions we need to describe tokens in a programming language tend to be quite simple.

I'm a third year student. Will I have to do the Christmas assignment?

Yes you will.

You said Keiko is implemented by an interpreter. What is the difference between an interpreter and a compiler?

An interpreter is a mechanism that takes a program and obeys it. In the case of machine-level programs, such as programs for Keiko or ARM, an interpreter can be particularly simple: it follows a cycle where it takes the next instruction of the program, carries it out by moving data around, adding, subtracting, and so on, and then moves onto the instruction that follows. Branching is implemented by changing the interpreter's idea of where the next instruction is to be found.

Here's the surprise: a computer is just an interpreter that's implemented in hardware and works at high speed, and that's the interpreter for ARM that we shall use. For Keiko, we use instead an interpreter written in software: there's a program that describes the behaviour of the Keiko machine that we have to compile for the host machine, and that program is used to execute the Keiko program rather than using special-purpose hardware. Because the decoding of each instruction is now done in software, the program runs perhaps 4–5 times slower than if we used machine code. The heart of the Keiko interpreter is a loop that has inside it the 'big switch', with one case for each kind of Keiko instruction; you can find this big switch in the source file interp.c that's generated as part of the build process, though the use of C macros makes it a bit hard to decipher what's going on.

Two modifications to the answer: we don't have to use an interpreter to execute Keiko code, and in fact there's a compiler that can translate Keiko into native code for the x86 or the ARM. The compiler is packaged in the same way as the Keiko interpreter, and translates each procedure in the program just before it is executed for the first time in any run of the program. In short, it is a JIT. I've not provided this implementation as part of the course materials, because it's better for our purposes to use the interpreter with its debugging facilities. But it is the standard way of running programs compiled with the Oxford Oberon Compiler.

Second, if we don't have an ARM to hand, we can instead use the 'emulator' qemu-arm. This acts like an interpreter for ARM machine code, but it too may have a JIT inside it for speed.

In the lecture, you pointed out that the syntactic correctness of the program shown below could not be captured by a context free grammar. If so, how does the compiler check it?

{ int xxxx, yyyyy; xxxx = yyyyy; }

The problem is that a context free grammar can't simultaneously express the requirements that the number of x's should match and that the number of y's should match in the two occurrences of each identifier. In theoretical terms, we are using the fact that the language xaybxayb is not context free, a fact that can be proved using the pumping lemma for context free languages (see later in the Models of Comp course). Our solution to the dilemma is not to attempt to settle questions like this in the parser, which works from a CFG, but to deal with them later. The type checker in our compiler will construct a symbol table that contains the identifiers xxxx and yyyyy, and will then check that each identifier that is used in the program appears in the symbol table.

General

What prerequisites are needed to follow the course?

There are three courses taken by our undergraduates that provide useful background for this one. The first is Functional Programming, which is important because we will be writing compilers not as purely functional programs, but certainly as programs that use recursion over algebraic data types, and sometimes higher-order functions. I will give a brief introduction to the OCaml language we will be using in the course, but you will need to be familiar with the ideas in advance.

The second relevant course is Models of Computation, where our students learn about regular expressions, finite state automata and context-free grammars. I will aim to make the Compilers course self-contained as regards these topics, but will not repeat material from the other course unless it is directly needed in this course. Our emphasis will be on using tools that take input written as a set of regular expressions or a context free grammar and automatically produce a scanner or parser. We will use these tools mostly as black boxes rather than concerning ourselves with how they work inside.

The third course is Digital Systems, where students will already have met simple algorithms for compiling arithmetic expressions to machine code, and also the general organisation of computer hardware and systems software. Though this course is largely self-contained, that material provides useful background. In particular, the target architecture we will use in the last part of the course is the ARM, and that is different from the MIPS that was discussed in Digital Systems, so I will assume no detailed knowledge of its programming model. Those who know the MIPS or another architecture will be able to recognise the similarities and the differences.

Will we be writing compilers in Oberon?

No, we'll use OCaml as our implementation language.

Will we be writing compilers for Oberon?

Well, sort of. We'll work towards a compiler that implements a simple, Pascal-like language that looks a bit like Oberon. It's natural to do so, because Pascal-like languages are easy to implement with machine-level operations, and lack the irregularities that can make a nightmare of compiling, say, C. Many of the techniques we learn can be applied to implementing other languages. And rest assured, the keywords will be in lower case!

The O in OCaml stands for Objective, doesn't it? Will we be using object-oriented ideas in the course?

OCaml includes object-oriented features of a slightly strange kind, but we won't be using them. Generally speaking, the facilities for defining data types in object-oriented languages are heavy and clumsy compared with functional programming; for example, what can be achieved by an algebraic type definition in half a dozen lines in Haskell or OCaml might take several pages in Java. And a function that is defined in a few lines using pattern matching and recursion might become a tedious application of the 'visitor pattern' that involves several classes spread over many pages. Scala of course improves on these things, but the existence of robust lex and yacc implementations for OCaml gives it the edge, in my opinion.

Will we actually be generating native code?

There will be two parts to the course. At first, we will target a stack machine, Keiko, that is like the Java Virtual Machine (JVM) but with instructions at a slightly lower level. Like the JVM, the Keiko machine has instructions that mostly consist of a single byte each, so the code is called bytecode. There are two different implementations of the Keiko machine:

  • an interpreter for the bytecode that has options to print each instruction as it is executed.
  • a JIT translator that compiles the bytecode into native code for the x86 or ARM just before it is executed for the first time.

The bytecode interpreter is sufficient for debugging the compilers we write, but it's nice to have the JIT implementation too.

So what about the second part of the course?

Glad you asked. In the second part of the course, the Keiko machine is used again as an intermediate code, that is, as the interface between the front end of the compiler, where source programs are translated into low-level operations, and the back end, where we will implement these low-level operations using machine instructions. The machine will be (a subset of) the ARM, the same processor that is used in the Raspberry Pi. This makes it possible to run our code on a real piece of silicon.

Why will we generate code for the ARM, and not the x86?

The ARM is attractive because it is a simple, modern architecture with plenty of registers that can all be used interchangeably. The 386/486/Pentium has too few registers, and bizarre restrictions on what they can be used for. For example, the SI and DI registers provide no way to address their bottom eight bits, and shift instructions must take the shift amount from register CX if it is not a constant. We could deal with all of this if we had to, but doing so would still leave us with only six usable registers, and the attendant complications would be a distraction from understanding the process of generating code. The AMD64 architecture removes some of these flaws, but introduces complications of its own.

Why do you use Mercurial for the labs? Why not Git? After all, Linus says that Git is better.

Mercurial and Git share the facility to make convenient copies of an entire repository and to synchronise changes between multiple copies of the same repository; that's what the word distributed means in the term distributed version control. (Actually, none of these products provides a truly distributed system, in that the people using them give explicit instructions to move data around the network; a distributed implementation would provide the abstraction of a single, unified history of development without relying on a central server.) Both products have the same basic data structure: a rooted, directed acyclic graph of revisions with operations to merge any two revisions by reference to their most recent shared ancestor. Though terminology differs a bit between them, Mercurial provides everything that we need for the course, and the same operations can be simulated in Git, but with a few more unexplained switches passed to the various commands, and a bit more of the works showing. There seems to be a consensus that Mercurial is a bit easier to learn, so I've decided to go with it. Everything we do with Mercurial will be applicable to any other version control system, distributed or not.

If you're already familiar with Git, then you might like to use it instead of Mercurial, so I've also provided the lab materials as a Git repostory. Begin with

git clone http://spivey.oriel.ox.ac.uk/git/compilers.git

But if you do decide to use Git, then the lab demonstrators may be less able to help you with any problems that arise.

The course page says Compilers (The Farewell Tour). Does that mean you're never going to give the course again?

I have given this course many times, and the material has evolved and (I think) improved over the years. Giving it is always a delight, and I think it's wonderful that the course has made it into the core of our degree, because it's a great forum for seeing that theoretical ideas – Functional Programming itself, but also ideas from Models of Computation and Algorithms – can be applied fruitfully, and also that practical techniques like version control and systematic, automated, testing are vital in any programming project of significant size. On top of that, language implementations make wonderful student projects. In fact, it's such a delight that I think it would be selfish to keep it to myself, when so many of my colleagues could also share in the fun by becoming involved in the future.

Arising from lectures

The prelude and postlude code for the Lab 4 compiler saves and restores all the ARM registers, even if the procedure body doesn't use them. Wouldn't it be better to save and restore only the registers that are actually used?

Yes it would be better, and that would be one of the first things to do in a project to improve the quality of the code. It would not be hard to make a partial improvement by buffering the code, monitoring which registers it uses, and generating a prelude and postlude that save and restore only those registers, provided we keep the same frame layout. That would mean leaving an empty space between the saved registers and the frame pointer, so as to keep the parameters and the static link at the same offsets, as shown in a comment in lab4/mach.ml. (Actually, the code is already buffered and output at the end of the procedure so as to allow the amount of space for outgoing parameters to be determined.)

Better still would be to eliminate the empty space, but that would introduce timing problems between phases, substantially complicating the compiler. This is particularly true with a language that allows nested procedures, with access from inner procedures to the parameters of outer procedures that enclose them. No longer would it be possible for the semantic analyser to determine the parameter offsets, so the parameter addresses would have to be represented symbolically in the optree, likely as references to the definition records for the parameters. Note that, with nested procedures, the optrees for a procedure body may contain references to the parameters of several different procedures, so it's not sufficient to implement a scheme that has the size of the frame head for the current procedure as the only unknown quantity. Instruction selection also would have to deal with unknown offsets in the frame, also very likely by embedding references to definition records in the buffered code. Then – once registers have been assigned – the offsets for parameters could be filled in before the code is output. Nested procedures could be dealt with by translating outer procedures before the inner procedures they contain, so that parameter offsets for the outer procedure have been assigned before they are needed. All this is complexity that we can well do without!

Register variables and temps are very similar in the way they are translated, and yet they are treated differently in the optree representation, with <LOADW, <REGVAR n>> and <STOREW, ..., <REGVAR n>> in one case but <TEMP n> and <DEFTEMP n> in the other. Why is this?

It's purely a matter of minor convenience: the intermediate code generator is a bit simpler if we pretend that a register variable has an 'address' <REGVAR n>, though of course that address is a fake and cannot be computed as a value. Then we can use a single routine gen_addr to treat all variable references, without having to insert special cases for 'loading' or 'storing' a register variable. No such fiction is needed for temps, and omitting it avoids creating the impression that the trees output by CSE are somehow more complicated (and maybe slower to execute) than the input trees. (In determining which variables can live in registers, the semantic analyser carefully avoids variables that must have a genuine address.)

What's the difference between a RISC and a CISC machine?

For us, the vital difference is that on a RISC, transfers to and from memory are done by separate instructions from arithmetic. This means that a statement like x := x+1, where x is a local variable in the frame (say at offset 24), is translated into the sequence

ldr r1, [fp, #24]
add r1, r1, #1
str r1, [fp, #24]

with a load and store separate from the add instruction. RISC architectures tend to have arithmetic instructions that can put their result in a different register from the operands, and they tend to have a large set of uniform registers. They tend to have a single, indexed addressing mode that adds together a register and a constant, with a pure constant and simple indirection through a register as special cases. Instruction selection for a RISC machine largely consists of ensuring that address arithmetic is, where possible, done using the indexed addressing mode rather than by a separate instruction.

On a CISC machine, arithmetic and memory transfers can be combined in the same instruction, so the statement x := x+1 might be translated as

add [fp, #24], 1

a complex instruction that would have to be executed in several steps. Instruction selection for such machines is more difficult, and if expressed via tree grammars typically requires non-trivial dynamic programming, since the selection of tiles for different parts of the tree can interact in complicated ways. Question 6 on problem sheet 5 invites you to consider this by investigating a machine with a memory-to-memory move instruction.

Type metrics consist of a size and an alignment; what is the alignment for?

Typically, byte-addressed machines have restrictions about what addresses can be used for quantities that are larger than a byte. For example, there may be a rule that a 4-byte integer must be aligned so that its address is a multiple of 4. That would mean that an array of bytes could begin at any address, but an array of integers would have to be given an address that was a multiple of 4. So the type 'array 10 of integer' has size 40 and alignment 4, but the type 'array 40 of char' has the same size but alignment 1. When the compiler lays out local storage for a subroutine or lays out a record type, it must leave gaps in order to satisfy these alignment restrictions, so that a character followed by an integer would leave a gap of 3. Sometimes space could be saved by shuffling the fields so that they appear e.g. in decreasing order of alignment, but that doesn't really matter much.

Recent x86 machines support unaligned loads and stores (of, say, a 4-byte quantity at an address that is not a multiple of 4) for backwards compatibility, but it's likely that they do so inefficiently, perhaps turning a load into two loads of adjacent words followed by some shifting, masking and bitwise or-ing. Further complications are likely to arise when an unaligned access crosses the boundary between cache lines or memory pages. So it's desirable to obey some alignment restrictions even when the architecture doesn't insist on them.

On the ARM, things are worse still for unaligned memory access, because the machine traps. On the Raspberry Pi, there's a trap handler that simulates the unaligned access and then resumes the offending program, but this takes hundreds of times longer than executing a properly aligned memory access. Rumour has it that the trap handler does this (as opposed to stopping with a segfault) just so that it is possible to run one offending program: Firefox.

I can't find the set instruction in any description of the ARM instruction set. Does it really exist?

I've introduced set as a synonym for a form of the ldr instruction: what we write as set r1, #12345 is normally written ldr r1, =12345. There are macro definitions in the file fixup.s that is included into every assembly program output by the compiler that establish this equivalence. The syntax ldr r1, =12345 is reasonable because the assembler implements the instruction by finding a place in the instruction stream where the constant can be stored, then using a genuine ldr instruction (with a PC-relative address) to move the constant into the specified register. But confusion sometimes results because when the constant loaded is an address and we want to load from the address, this ldr instruction only does half the job, and we need another ldr instruction to finish it. If x is a global variable, then the code we will generate to fetch its value into a register looks like this:

set r1, _x
ldr r2, [r1]

The first instruction sets r1 to the address of x, and the second loads from this address and puts the result in r2.

Apart from helping us to identify the ldr instructions that load from an address that we specify, giving a different name to the operation that develops large constants prepares for giving it a different implementation in variants of our compilers. And indeed, on versions of the ARM from arm7 on, used in the RPi2 and later, there is a pair of instructions movw and movt that allow us to fill in the two 16-bit halves of a register, and that is the preferred implementation of set on these architectures. Most RPi software, however, still uses ldr...= in order to maintain compatibility with armv6 and the original RPi.

The assembler needs to find a safe place to put the table of constants used in set instructions, and we help it by placing an .ltorg directive just after the end of each procedure, where the processor will never attempt to execute the data as instructions.

In the Lab 4 compiler, there are lots of functions fits_offset, fits_move etc. What is their purpose, and why are there so many different ones?

These functions determine whether a constant appearing in the program is small enough to fit in the immediate field of an instruction: if so, then we can generate compact, quick code, and if not, then the compiler must be prepared to generate longer code that first develops the constant into a register. For example, the expression x+1 can be translated into and add instruction containing 1 as an immediate constant:

add r2, r1, #1

But x+1234 needs the constant to be put in a register first:

set r3, #1234
add r2, r1, r3

In this case, we use the function fits_immed to decide between these alternatives.

The complication arises because different ARM instructions have different sizes of immediate fields, and different rules for interpreting them. We do need some test for constants that fit, and we could use a single test that covered all situations in a conservative way; but it's nice to do things properly, and the details are hidden in a handful of functions. It's just a matter of using the right one in each situation:

  • For load and store instructions, the test is fits_offset. The instructions use a sign and magnitude representation for the offset, so values between -4095 and 4095 (inclusive) are allowed. (Note that these limits apply to the ldr and ldrb instructions that we use; other instructions like ldsh (load signed halfword) that we don't use have smaller limits.)
  • For general arithmetic operations, the test if fits_immed. The instructions have an eight-bit immediate field, but can rotate it in the 32-bit word by any multiple of two bits; that means the values 0 to 127 are representable directly, but also 128 and 132 (etc.) as 32 << 2 and 33 << 2 – but not 129, 130 or 131. This seems fussy, but note that it does allow any power of two to be expressed, as well as bit masks for any field of up to seven or eight bits, wherever it is positioned in the word.
  • For move instructions, the test is fits_move. If fits_immed n fails, we can try fits_immed (~n) using the bitwise negation of n; if this succeeds, then the assembler will use an mvn instruction in place of the mov. (We could just generate ldr r0, =const all the time, and leave it up to the assembler to simplify this to a mov or mvn if it can – but then we lose track of the fact that mov is cheaper than ldr=.)
  • For certain add instructions involved in address calculations, the test fits_add can be used. It fits_immed n fails, we can try fits_immed (−n), and the assembler will then generate a sub instruction. (Ordinary uses of addition with a negative constant will already have been transformed into subtractions by the simplifier.)

While it's important to get details like this right in a compiler implementation, it's also important to avoid letting such details distract us from the overall strategy. Each of these tests has a role to play, but the role is isolated in one or two lines of one component of the compiler. (The architecture-specific details will never be the substance of an exam question.)

Keiko

What is the difference between the PLUS and OFFSET instructions?

PLUS is used to add genuine integers, and OFFSET is used where we are adding a base address and an integer offset. On the targets we use, these two instructions do exactly the same thing: add together two 32-bit integer quantities. But on some targets, they might be different: for example, AMD64 is a machine where addresses are 64 bits, but offsets are 32 bits, and on that machine OFFSET will involve sign extending the offset before adding.

The instruction now called OFFSET was called PLUSA in earlier versions of the course, causing even more confusion. No doubt occurrences of PLUSA persist in a few places: please just read them as equivalent to OFFSET.

What's the difference between the instructions LOCAL n and CONST n?

LOCAL n pushes fp+n, where fp is the frame pointer; this is the address of a variable at offset n in the stack frame for the current procedure. CONST n just pushes the constant n. Actually, we could get rid of LOCAL n if we used a more primitive instruction BASE, equivalent to what we now write as LOCAL 0, that pushed the value of fp; for then we could replace LOCAL n by BASE; CONST n; OFFSET.

There are two reasons for having LOCAL and not replacing it with the simpler BASE operation, even in the last part of the course: first, that using LOCAL makes the intermediate code more compact and easier to understand; and second, that it simplifies the CSE phase, oblivious as it is to the details of instruction selection, if we let it use the heuristic that LOCAL n is a trivial expression not worth sharing.

What are the three numbers that appear in a PROC instruction?

A PROC instruction (or, properly speaking, a PROC directive) has the format

PROC name fsize stkmax ptrmap

where name is the name of the procedure, fsize is the amount of space to allocate for local variables on the stack, stkmax is the maximum number of items on the evaluation stack, and ptrmap is a pointer map for the stack frame. Our compilers will compute fsize by adding up the size of all the local variables of the procedure, taking alignment into account. The current implementation of Keiko does not use the value of stkmax, so we can safely replace it by zero. Keiko has a garbage collector that relies on knowing the locations of pointers in each stack frame, but the version of Keiko used in the course does not have this garbage collector, so it's safe to set ptrmap to zero too.

Why is TCL needed to build Keiko?

Keiko is implemented by an interpreter written in C, but parts of that interpreter and the bytecode assembler that comes with it are generated by a program written in TCL. Doing this provides some useful flexibility about instruction encodings, and ensures that the assembler and interpreter have a consistent idea about which numeric opcode goes with which instruction. Once the interpreter and assembler have been generated and compiled, TCL plays no further role in the running of Keiko programs.

In calling a procedure, the static link is pushed on the stack before the PCALL instruction. Where does the dynamic link come from?

On the Keiko machine, the instructions before the call should push on the stack the following information:

  • The arguments, pushed from right to left so that the first argument is nearest the top of the stack (which grows downwards in memory) and therefore at the smallest address.
  • The static link (SL).
  • The address of the procedure (CP).

The CALL instruction refers to the procedure address but leaves the arguments and static on the stack, and adds:

  • The return address (RA): this is the address of the next instruction after the PCALL instruction, and execution will resume there when the procedure returns.
  • The dynamic link (DL): this is the present value of the bp register, which is then reset to the value of the stack pointer.
  • Local storage for the procedure. The size of the local storage is specified in the PROC directive that begins the procedure.

The frame head is make up of SL, CP, RA, DL; two of these are contributed by the caller, and two are added by the action of the PCALL instruction.

When the procedure returns, bp is reset from the dynamic link, pc is reset from the return address, and the stack pointer sp is reset in such a way that the n parameters specified in the PCALL instruction have disappeared from the stack.

Is the procedure address used on Keiko to call a procedure actually the address of the procedure's bytecode?

No: each procedure is described by a little table that contains a pointer to the bytecode as well as the size of the local variables, a pointer map of the frame, and the values of any large constants that are used by the procedure. The procedure address is the address of this table, and the Keiko machine's cp register points to the table throughout execution of the procedure. But all that is transparent to compilers generating code for the machine. The PROC directive in the Keiko assembly language looks after all the details.

Labs

What incompatibilities exist between different OCaml versions?

Raspbian Jessie still has OCaml version 4.01.

The lab machines have 4.02 or a later version, which introduces Bytes and deprecates destructive operations on strings. The lab materials are written to use Bytes, but there is a compatibility library that is automatically selected if the OCaml version is ≤ 4.01.

In 4.03, String.uppercase is deprecated in favour of String.uppercase_ascii. We avoid using either by defining an uppercase function ourselves where it is needed.

My compiler gives the error message "Stack overflow" when it runs. What do I do now?

Notice that this message comes from your compiler when it is translating the test program, not from the object program when you run it on the Keiko machine. Your compiler has entered an infinite recursion, and you can get a stack backtrace using the following unix command:

$ env OCAMLRUNPARAM=-b ./ppc test.p

(where test.p is the test program you were trying to run). This works provided your compiler was built using the -g flag to ocamlc. The most likely culprit is a recursive function you have just introduced into the compiler, and the likeliest of all is an off-by-one error in the recursive loop that generates code to follow the static chain.

My compiler gives an error message "generated code failed sanity check". What do I do now?

Again, this message comes from your compiler, not from the runtime system. The function Keiko.output that's supposed to output the Keiko code first applies a crude sanity check, verifying for example that your code does not try to pop non-existent values off the evaluation stack. The code generated by your compiler fails this check, probably indicating that it would crash the Keiko machine if run. Look in the assembly language file a.k to find which instruction triggered the message.

When I run the object code from my compiler, I get the message "segmentation fault". No other helpful message is printed. What do I do now?

Welcome to the wonderful world of compiler debugging, where it is more productive to read the code than to run it. The object code from your compiler is broken somehow, in a way that was not detected by the sanity checks mentioned above. You can try simplifying the test case until you find which part of it is provoking the error, but in the end the most productive thing is to read the code (found in the file a.k) and see where it fails to make sense.

How can you get the Keiko runtime system to trace instructions?

It turns out that you need to give the -d flag twice. From the Lab 2 directory:

$ ../keiko/ppx -d -d ./a.out

Each -d increments the debugging level, and the execution trace appears at level 2 or above. The output from ppx shows first a disassembly of the binary program, then a trace with two lines for each cycle, one showing the contents of the evaluation stack and the other showing the instruction that's about to be executed. There are too many values printed in hexadecimal, but at least you can follow the stream of instructions.

In Lab 2, my compiler generates a program from case.p that prints the message illegal instruction 0. Why, why, why?

Your program probably contains a sequence like this:

CASEJUMP 3
CASEARM 1 6
CASEARM 3 6
CASEARM 5 6
CASEARM 2 7
CASEARM 6 7
CASEARM 8 8

The problem here is that the number of cases (3) in the CASEJUMP instruction does not agree with the number (6) of following CASEARM instructions, and that leads to the bytecode interpreter trying to execute part of the jump table as if it were code. I'll resist the temptation to give a blow-by-blow account of the reason why this leads to the message you saw; suffice it to say that I think you're lucky to get a message at all. (This error ought now to be caught by the sanity checks – see above.)

In Lab 4, why does this program (or a related one) segfault?

var p;

proc compose(f, g);
  proc fg(x); begin return f(g(x)) end;
begin
  return fg
end;

proc add2(x); begin return x+2 end;
proc square(x); begin return x * x end;

begin
  p := compose(square, add2);
  print p(2); newline
end.

The program forms a closure for the function fg that contains a pointer to the stack frame for compose; this closure is then returned as the result of compose, and continues to exist after the stack frame for compose has been destroyed. A subsequent attempt to call this closure follows the static link into oblivion. There is no check in the runtime system that closures contain 'sensible' addresses, so the result is a segmentation fault.

There's a similar program compose.p in the lab materials that preserves the stack frame for compose by a trick: by calling compose from a function that itself has a big enough stack frame, it arranges that in the subsequent execution of the program, the stack pointer never climbs high enough to overwrite the frame for compose, and so the static link continues to work. This is only a dirty trick, however.

How can I set up my own machine to work on the Lab exercises away from the lab?

There's a page all about that.

Trying 'make test2 in Lab 4 gives the message, 'Please get the correct guest_rsa file from Mike'. What should I do?

You need a copy of the file /users/mike/pi/guest_rsa from the lab machines: it is a cryptographic key that will enable you to connect to the server using ssh. The file is not included with the lab materials because they are publicly accessible. Put the file in the tools subdirectory after cloning the repository.

The JCASE operation is implemented by a code sequence that includes an ldrlo instruction. What is that?

The optree <JCASE ([lab0; lab1; lab2], lab3), e> is implemented by the ARM code

cmp r0, #3
ldrlo pc, [pc, r0, LSL #2]
b lab3
.word lab0
.word lab1
.word lab2

The ldrlo instruction is (i) a conditional load instruction, (ii) based on an unsigned comparison, that (iii) puts its result in the program counter pc and (iv) uses pc-relative addressing. Let's consider those four aspects one at a time:

(i) as a conditional load instruction ldr-lo, it does something if a certain condition is satisfied, and does nothing otherwise. If the ldrlo instruction does nothing, then the next instruction, b lab3 branches to the default label lab3.

(ii) the condition under which we want to branch to a case label (and not the default) is if 0 ≤ e < 3. This condition is evaluated by using an unsigned less-than comparison between e and 3. If e is 3 or more, then the comparison is false; if e as a signed quantity is negative, then as an unsigned quantity it appears large and positive, so again the comparison is false; and it e is between 0 and 2 (inclusive) then the comparison is true.

(iii) making the ldrlo instruction put its result in the pc is equivalent to branching to whatever label from the jump table is selected. Execution proceeds from that label if the load happens.

(iv) the pc-relative addressing enables the jump table to be put just after the instructions. Because of pipelining, reading the pc on ARM yields a value that is 8 bytes after the current instruction, and that is the address of the branch table, beginning .word lab0. To this is added 4 times the value in r0 to obtain the address in the table where the relevant label value is found.


My compiler's code for some of the test cases in Lab 4 is dying with a segfault. Is it possible to use QEMU to single-step the program and find out where the problem lies?

It is possible, as I explain on another page. But generally speaking, blindly single-stepping a program of any size is likely to result in confusion rather than enlightenment. If it's an existing test case, it's probably more fruitful to look at how the code from your compiler differs from the working code stored with the test case, and make test0 exists as a convenient way of doing that. If looking at the code doesn't reveal cause of the problem, then reduction techniques – like those I showed in the last lecture of the course – are a good prelude to using the debugger, and maybe will even allow you to narrow down the problem and find the solution before you even start single-stepping.

Mostly Mercurial

What should I do with Mercurial when I've finished a lab exercise?

It's quite a good idea – while in a subdirectory like compilers/lab2 – to use hg st . to find out what files you've changed; the dot at the end of this command restricts its effect to the current directory, ignoring any changes made elsewhere in the directory tree. The listing you see will look something like this:

$ hg st .
M kgen.ml
M Makefile
? mytest.p
? kgen.ml~

meaning that files kgen.ml and Makefile have been modified, and mytest.p and the backup file kgen.ml~ are unknown to Mercurial. Your next actions might me to add the file mytest.p to Mercurial's list, then to check in your changes, giving a brief log message:

$ hg add mytest.p
$ hg ci -m "Finished Lab 2" .

Include the dot again so as to check in just the modified files in the current directory.

How can I keep my copy of the lab materials up to date?

When you've finished a lab and checked in your changes, we may ask you to update your copy of the lab materials, using the following sequence of commands:

$ hg pull
$ hg merge
$ hg ci -m 'Merged upstream changes'

The first of these commands contacts the Mercurial server and copies the updates to your clone of the repository; and the second merges the changes into your working copy of the files. The third command adds the result of the merge to your copy of the repository by making a revision with two parents: one being the latest version of your work, the other the newly-added version of the lab materials. It's always safe to pull and merge in this way, and it makes sure you have the latest version of everything.

The bookmark basis is automatically updated so that it points to the latest version of the lab materials, so that you can make a diff with the command

$ hg diff -r basis .

at any time.

I tried to check in my work, but I got a message saying abort: no username supplied (see "hg help config"). What should I do?

You need to create a file called .hgrc in your home directory containing the following two lines, with the name and e-mail address replaced by your own details.

[ui]
username = Fred Bloggs <fred@bloggs.com>

When you check in revisions, this name and address are attached to each check-in, and are subsequently shown in Mercurial's logs.

How can I use gedit to edit the commit message when I check in my work?

It's complicated. You can specify the editor you'd like to use by setting one of the environment variables HGEDITOR or VISUAL, and Mercurial will invoke that editor if you don't specify a commit message on the command line with the -m flag. But the convention is that this editor should not exit until the commit message is saved, and gedit doesn't respect this convention; if (as is likely) there is already a gedit session running, then invoking gedit as a shell command opens a new tab in the existing session and exits immediately, leading to an abort with "Commit message was empty". You could try to hack around this, but take it for all in all, the game is not worth the candle.

Vim users can set either environment variable to vim; emacs users can set it to emacsclient, a special wrapper program that pops up a new buffer inside an existing emacs session and waits for you to invoke the command C-x #.

A simpler solution is just to specify the commit message on the command line: a short phrase is enough for our purposes.

The Christmas assignment

Can we use the software lab machines, instead of setting up our own environment?

As far as I'm concerned, there's no reason why not, and no reason why you cannot access the machines remotely over SSH, provided you are happy with a text-based editor. But there will be no access to the building during the Christmas shutdown period, and the machines and various other pieces of infrastructure may not be reliably available for remote access. If you can't schedule a solid week's time in Oxford during the vac, I would suggest setting up your own environment, and perhaps thinking of remote access as a Plan B.

Will we have to submit our code?

No: the plan is for you to submit a report including the detail of the changes you have made and an explanation of why they satisfy the requirements. That makes the assessment process rather like a code review.

Do we have to use Mercurial?

Yes and no: just like during term, you have the option of using Git instead. But whichever you use, part of the report you submit should consist of neatly formatted and tidy difference listings showing the changes you made; you'll want a version control system to help with that.

How do I make my own test case in Lab 4?

First prepare a file test/mytest.p containing the compiler input, the expected output, and an empty marker for the assembly language:

begin print_num(43); newline() end.

(*<<
43
>>*)

(*[[
]]*)

Then say make promote-mytest. This will run the compiler and paste the object code into the test case in the right place, so that subsequent runs of make test0-mytest will show any change in the code, and so that you can print the test case for inclusion in your assignment report. (You have to put in the expected output by hand, to remove the temptation to create test cases without checking the output is actually right.)

How did you format the diff listings in the sample report?

I used hg diff to make the initial diff, then enscript -Ediffu to format it with the actual changes in bold. The output of enscript is in Postscript, so I used ps2pdf to convert it to PDF, and pdfunite to concatenate the parts of the report into one file. It's best to stick with enscript's default fixed-width font for the listings, or the lines will be indented inconsistently, nauseating the reader.

Trivia

Why is the abstract machine named Keiko?

Keiko was the name of the orca (killer whale) featured in the film Free Willy, which was housed at the Oregon Coast Aquarium when I first visited it; I was looking for a name for the machine at the time. I believe Keiko is a girl's name in Japanese, but the whale was a male. A subsequent attempt to release him into the wild ended sadly.

What font do you use in the printed materials for the course?

The fonts mostly come from the Lucida family, which is well set up for use with TeX. The body font is Lucida Bright. I couldn't resist the temptation to set programs that form the input to our compilers using an alphabet drawn by Adrian Frutiger in the early 1960s for a French book on Algol, and lately revived by Michael Sharpe as an Opentype font.

  • Some questions from the ancient past can be found in the FAQ archive.



  1. Niklaus Wirth: What can we do about the unnecessary diversity of notation for syntactic definitions? CACM, Vol. 20, Issue 11, November 1977, pp. 822–823.