Frequently asked questions (Compilers)

Copyright © 2024 J. M. Spivey
Jump to navigation Jump to search

If you have a question, perhaps it is answered below – or maybe you can find help in the growing glossary. Feel free to add headwords to the glossary so Mike can fill in the definitions, or to add questions below.

New questions

On the ARM, we pass the static link in register r10. How is the dynamic link passed?

The dynamic link has only one use: when the subroutine returns by loading the return address in to pc, at the same moment the dynamic link is loaded into fp, so that the calling subroutine is resumed in the right frame. The dynamic link is therefore the value in fp at the start of the subroutine. Our boilerplate prelude code saves the incoming fp value as part of the frame head, before resetting fp to point to the base of the new frame. Then, on exit, the postlude code pops the saved fp value back into the fp register.

How can I persuade the front end for picoPascal to generate a <BINOP Lsl, ...> optree?

In the compiler for Lab 4, the language has a predefined function lsl(x, n) that shifts x left by n bits. It's defined in check.ml as an abbreviation for BINOP Lsl. I must confess, I added this function after spotting the problem with shift operations that's explored in problem sheet 5, just in order to be able to generate shift instructions for testing.

Could our compiler do a better job of register allocation if it didn't do allocation during code selection?

Yes, it could. The principal weakness of the current scheme is that the register allocator "doesn't know where it's going", in that it will often put a value into a randomly-chosen register, only to find that it must be moved later. If it could delay choosing an actual register until the final destination was known, then the move could be eliminated.

A quick example of this is the procedure p, defined by

proc p(x: integer): integer;
  var y: integer;
begin
  y := x;
  print_num(x); newline();
  return y
end;

Our compiler will store x into the stack frame (unfortunate, but done for simplicity), but makes y into a register variable. It also makes a common subexpression out of the two occurrences of x in y := x and in print_num(x). The compiler output (with optrees interspersed) looks like this:

_p:
	mov ip, sp
	stmfd sp!, {r0-r1}
	stmfd sp!, {r4-r10, fp, ip, lr}
	mov fp, sp
@ y := x;
@ <DEFTEMP 1, <LOADW, <LOCAL 40>>>
	ldr r5, [fp, #40]
@ <STOREW, <TEMP 1>, <REGVAR 0>>
	mov r4, r5
@ print_num(x); newline();
@ <ARG 0, <TEMP 1>>
	mov r0, r5
@ <CALL 1, <GLOBAL print_num>>
	bl print_num
@ <CALL 0, <GLOBAL newline>>
	bl newline
@ return y
@ <RESULTW, <LOADW, <REGVAR 0>>>
	mov r0, r4
	ldmfd fp, {r4-r10, fp, sp, pc}
	.pool

Observe that on fetching the value of x, the compiler puts it arbitrarily in r5, a free callee-saved register. But this is a silly choice, because the two uses of the temp both require the value in other registers: assigning to y needs it in the register r4 where the register variable lives, and passing it to print_num requires it to be put in the argument register r0. With this insight, we could immediately improve the code by putting the temp in r0 or r4 instead and saving one or other of the two moves. (We must put a copy of the value in a callee-saved register across the call to print_num in order to return it later.)

What are the rules for passing arguments in the ARM ABI? If arguments are passed in registers, isn't it inefficient to save them in the stack frame? And if a subroutine takes only one argument, why do you save two registers?

Let's concentrate on subroutines that take at most four arguments, because that's sufficient for most examples. I'll mention at the end what to do if there are more arguments than that.

The ARM ABI requires the arguments to be passed in registers r0, r1, r2, r3, as many registers as there are words of argument information. A subroutine must expect its arguments in these registers, but it's up to the subroutine where it puts them subsequently, and there are several possibilities.

  • If the subroutine is a leaf routine (one that calls no others) then often it's possible just to leave the arguments where they arrived.
  • If the subroutine calls others but needs few registers to do the work in its body, then it might work to move the arguments from the registers r0--r3 where they arrived into other registers, chosen from r4--r10, that will be preserved across the calls to other routines.
  • But in general, it becomes necessary to save the incoming arguments in the stack frame of the subroutine. Because I want to avoid implementing multiple strategies, and also the work of deciding which strategy is the best feasible one, I've decided to make all subroutines save their arguments in the stack frame.

A small mitigation is that a few local variables of the subroutine can be implemented as 'register variables' and live in registers throughout the subroutine body. A simple but dramatic improvement in performance would be gained by allowing some of the parameters to live in registers too, and move them to the right place as part of the preamble, making the preamble code compare for each parameter the place it arrives with the place it should live, and move it if necessary. Another gain would be to treat leaf routines specially, using the registers in a different way. And then, of course, there's the possibility of saving not all of the callee-save registers, but just the ones that are used. All these things are steps that are simple enough in themselves, but contribute incrementally to the complication of the compiler.

As for the issue of pushing two registers when there's just one argument (or four registers when there are three), that's a simple way of ensuring that the sp value is kept a multiple of 8, which is required by the ABI. The extra load or store costs nothing if it is part of a push instruction (or stmfd as we write it). If there are more than four arguments, then the caller will save arguments 5 and above in the stack space just above where my preamble code saves arguments 1 to 4, and the net result is that after the preamble all arguments appear contiguously in the stack frame, starting at fp+40.

The subroutine in the lecture, as well as taking one argument (and therefore saving two registers for reasons just explained) also accessed the global array a using the operation <GLOBAL a>; this access to a has nothing to do with the parameter mechanism.

The digits program seems very obscure when written in a Pascal-like language. Can you use a functional language to make it clearer?

By writing functions that deliver functions as their results, we can use functional values to implement an abstract data type of digit sets with named operations: all, member and delete. In terms of these operations, the search function can be written like this:

let rec search k n avail =
  (* The first k digits have been chosen as n, and avail is the set of digits remaining.  Complete the
     solution in all possible ways and print the results. *)
  if k = 9 then
    print n
  else begin
    for d = 1 to 9 do
      let n1 = 10 * n  + d in
      if n1 mod (k+1) = 0 && member d avail then
        search (k+1) n1 (delete avail d)
    done
  end

The solution then begins with the call search 0 0 all.

We could use the preceding program with any implementation of sets of digits, and bitmaps would be a very sensible choice. For the sake of a test case, we'll choose less wisely and make each set a function from digits to booleans. The membership test then applies the function to a digit and delivers the boolean result.

let member x s = s x

The set of all digits is simply a function that always returns true.

let all = (fun x -> true)

Just slightly tricky is a function that deletes a specified digit from a set, giving a smaller set.

let delete s x =
  (fun y -> y <> x && s y)

It's here that we rely on functions with a function as their result. The curried style of OCaml would let us hide this by writing

let delete s x y = (y <> x && s y),

but the way delete is used in search depends – curried or not – on applying delete to only some of its arguments, leaving a function whose body will run when the remaining arguments are supplied. Even if we uncurried the rest of the program, the need for a curried delete function would remain. Such things are not supported well on a wholly stack-based implementation, and need something like heap-allocated closures – unless we substitute the definition of delete into search and simplify it.

What's the difference between the RETURN and RETURNW instructions?

None at all. There used to be a difference, in that the RETURNW instruction would put a return value in a special interpreter register, from which the SLIDEW instruction (the second half of PCALLW) would retrieve it. But now the Keiko bytecode interpreter implements SLIDEW in a different way, and the return value needs to be moved only once, so RETURN and RETURNW have become the same. I have adjusted the notes and lab instructions so that they refer only to RETURN, but I've left RETURNW as an equivalent in the Keiko assembler, in case any stray occurrences remain. [The sanity check that is made when our compilers output code assumes that RETURN will find a result on the stack; that's OK, because in Lab 3 that is always the case. The check should be fixed, though, to avoid possible future surprises.]

Will we learn to write lexers and parsers by hand, in addition to learning how to use lex and yacc?

I don't want to spend too much time on these very first phases of the compiler in lectures, because that would reduce the time available for other things. But if you are interested, then I've provided an alternative implementation of Lab 1, in the directory extra/lab1q of the lab materials, that contains a hand-written lexer and a parser written using the technique of recursive descent, which I think is the most practical way of building a parser by hand. In addition (and independent of the preceding choices) I've arranged that the parser doesn't build an abstract syntax tree, but translates statements as it goes. The parser does still build simplified trees for expressions, so that they can be translated either into code to evaluate them on a stack, or short-circuit code to branch on their values. Just clone the lab materials, or update your copy, to get the code.

Ocamlyacc reports a shift-reduce conflict for my grammar, but still produces a parser. Is it safe to ignore the message and use the parser anyway?

Ocamlyacc's convention in the presence of a shift-reduce conflict is to resolve the conflict in favour of shifting, because this sometimes results in a working parser that resolves any ambiguity one way rather than the other. For example, a grammar containing the productions

expr :
    IDENT                                   { Variable $1 }
  | expr PLUS expr                          { Plus ($1, $3) } ;

will result in a shift-reduce conflict, because with expr PLUS expr on the stack and PLUS as the look-ahead, the parser doesn't know whether to shift the PLUS, so that an expression such as x + y + z is parsed with y + z as a sub-expression, or to reduce immediately, so that x + y becomes a sub-expression. Yacc reports the conflict and resolves it in favour of shifting, and the resulting parser is safe to use provided either we postively want the corresponding interpretation, or we don't care which interpretation we get, as might be true here given that addition is associative.

Notice that in this case, the grammar is actually ambiguous. It's a common practice in using yacc to write ambiguous grammars, and then confirm the interpretation that's desired by writing precedence annotations as part of the script. In this case, we could write (in the top part of the script before the %%) the annotation

%right PLUS

to confirm that we want to shift the PLUS and make the operator right-associative, or

%left PLUS

to make the parser reduce instead. We've avoided such annotations in the course by always writing unambiguous grammars, but allowing ambiguity and resolving it with precedence rules was standard practice in former days, when parser size and speed were thought of as crucially important: the ambiguous grammar often has fewer non-terminals, and requires fewer reductions when parsing.

Another situation where benign conflicts arise is in the 'dangling else' problem, where the simple, natural grammar for if--then--else statements is ambiguous, and we can either write a more elaborate grammar that is unambiguous, or annotate the grammar to say, roughly speaking, that the ELSE token is right-associative, and resolve the ambiguity that way. It's always been my practice, when yacc reports a conflict, to think the situation through, and add explicit precedence annotations to the grammar so that the warning message goes away, and I think that is a wise practice to adopt. It's certainly not safe, as we shall see, to ignore the warning and use the resulting parser without further care.

An example of a harmful conflict is provided by a Pascal-like language that includes the notation a[i..j) for taking a 'slice' of an array. Let's assume the language also contains the notation r.x for selecting a field x from a record r, so that the grammar contains productions

expr : 
    variable                                { $1 } 
  | expr PLUS variable                      { Plus ($1, $3) } ;

variable : 
    IDENT                                   { Variable $1 }
  | variable DOT IDENT                      { Select ($1, $3) }
  | variable SUB expr BUS                   { Subscript ($1, $3) }
  | variable SUB expr DOT DOT expr RPAR     { Slice ($1, $3, $6} } ;

The difficulty here is not the mismatched brackets, because by the time the closing round bracket appears, the parser knows very well what to expect. Instead, the problem is that after, say IDENT SUB IDENT with lookahead DOT, the parser doesn't know whether to reduce the second IDENT to variable in preparation for reducing variable DOT ident, or to reduce it to expr in preparation for a second DOT leading to a slice. Yacc issues a warning, and selects its default action of shifting, making the parser commit to expecting the name of a record field. The consequence is that the expression a[i..j) is rejected with a syntax error when the first dot turns out to be followed by a second dot and not an identifier, a fact that should be caught by almost any test of slices that do not have a constant lower bound. Note that the grammar is actually unambiguous, just not one that can be parsed with a single token of lookahead.

The pragmatic solution in this case is to treat the double dot in a slice as a different token DOTDOT, modifying the lexical analyser appropriately. Otherwise, the problem is quite hard to solve. Theory tells us that any LR(2) language also has an LR(1) grammar, but that grammar may be very different and very cumbersome, especially if we want to build an abstract syntax tree with the same shape. It won't work to try to delay the decision between slicing and selection a bit because, in an expression like a[i+r.x], at the dot we must decide between taking i+r as the lower bound of a slice, or expecting r.x to be a field selection that forms the right operand of plus. The possibilities quickly multiply, and the easy way out – introducing a new token – seems by far the best option.

General

What prerequisites are needed to follow the course?

There are three courses taken by our undergraduates that provide useful background for this one. The first is Functional Programming, which is important because we will be writing compilers not as purely functional programs, but certainly as programs that use recursion over algebraic data types, and sometimes higher-order functions. I will give a brief introduction to the OCaml language we will be using in the course, but you will need to be familiar with the ideas in advance.

The second relevant course is Models of Computation, where our students learn about regular expressions, finite state automata and context-free grammars. I will aim to make the Compilers course self-contained as regards these topics, but will not repeat material from the other course unless it is directly needed in this course. Our emphasis will be on using tools that take input written as a set of regular expressions or a context free grammar and automatically produce a scanner or parser. We will use these tools mostly as black boxes rather than concerning ourselves with how they work inside.

The third course is Digital Systems, where students will already have met simple algorithms for compiling arithmetic expressions to machine code, and also the general organisation of computer hardware and systems software. Though this course is largely self-contained, that material provides useful background. In particular, the target architecture we will use in the last part of the course is native ARM code, and that is different in some details from the Thumb code that was discussed in Digital Systems, so I will assume no detailed knowledge of its programming model. Those who know Thumb code or another architecture will be able to recognise the similarities and the differences – where Thumb code is concerned, it is mostly the lifting of irksome restrictions.

Will we be writing compilers in Oberon?

No, we'll use OCaml as our implementation language.

Will we be writing compilers for Oberon?

Well, sort of. We'll work towards a compiler that implements a simple, Pascal-like language that looks a bit like Oberon. It's natural to do so, because Pascal-like languages are easy to implement with machine-level operations, and lack the irregularities that can make a nightmare of compiling, say, C. Many of the techniques we learn can be applied to implementing other languages. And rest assured, the keywords will be in lower case!

The O in OCaml stands for Objective, doesn't it? Will we be using object-oriented ideas in the course?

OCaml includes object-oriented features of a slightly strange kind, but we won't be using them. Generally speaking, the facilities for defining data types in object-oriented languages are heavy and clumsy compared with functional programming; for example, what can be achieved by an algebraic type definition in half a dozen lines in Haskell or OCaml might take several pages in Java. And a function that is defined in a few lines using pattern matching and recursion might become a tedious application of the 'visitor pattern' that involves several classes spread over many pages. Scala of course improves on these things, but the existence of robust lex and yacc implementations for OCaml gives it the edge, in my opinion.

Will we actually be generating native code?

There will be two parts to the course. At first, we will target a stack machine, Keiko, that is like the Java Virtual Machine (JVM) but with instructions at a slightly lower level. Like the JVM, the Keiko machine has instructions that mostly consist of a single byte each, so the code is called bytecode. There are two different implementations of the Keiko machine:

  • an interpreter for the bytecode that has options to print each instruction as it is executed.
  • a JIT translator that compiles the bytecode into native code for the x86 or ARM just before it is executed for the first time.

The bytecode interpreter is sufficient for debugging the compilers we write, but it's nice to have the JIT implementation too.

So what about the second part of the course?

Glad you asked. In the second part of the course, the Keiko machine is used again as an intermediate code, that is, as the interface between the front end of the compiler, where source programs are translated into low-level operations, and the back end, where we will implement these low-level operations using machine instructions. The machine will be (a subset of) the ARM, the same processor that is used in the Raspberry Pi. This makes it possible to run our code on a real piece of silicon.

Why will we generate code for the ARM, and not the x86?

The ARM is attractive because it is a simple, modern architecture with plenty of registers that can all be used interchangeably. The 386/486/Pentium has too few registers, and bizarre restrictions on what they can be used for. For example, the SI and DI registers provide no way to address their bottom eight bits, and shift instructions must take the shift amount from register CX if it is not a constant. We could deal with all of this if we had to, but doing so would still leave us with only six usable registers, and the attendant complications would be a distraction from understanding the process of generating code. The AMD64 architecture removes some of these flaws, but introduces complications of its own.

Why will we generate code for the 32-bit version of the ARM? Isn't 64 bits more common now?

From our point of view, there is little difference between the two, except that 32-bit code is a bit simpler to deal with, and the addresses are shorter if we ever need to compare them by eye for debugging. I haven't much experience with 64-bit code for the ARM, but on x86, compilers that emit 64-bit code tend to use more complicated addressing modes to avoid specifiying too many large constants. On both platforms, the doubling of the number of registers is the biggest factor in making 64-bit code more efficient, but our simple compilers will rarely create enough register pressure for that to be significant for us. Finally, though the Raspberry Pi now has a 64-bit core, the standard software sticks to 32 bits for compatibility across the range, so it's easiest for us to stick to 32 bits too, and have an easy way to run our code on real silicon.

Why do you use Mercurial for the labs? Why not Git? After all, Linus says that Git is better.

Mercurial and Git share the facility to make convenient copies of an entire repository and to synchronise changes between multiple copies of the same repository; that's what the word distributed means in the term distributed version control. (Actually, none of these products provides a truly distributed system, in that the people using them give explicit instructions to move data around the network; a distributed implementation would provide the abstraction of a single, unified history of development without relying on a central server.) Both products have the same basic data structure: a rooted, directed acyclic graph of revisions with operations to merge any two revisions by reference to their most recent shared ancestor. Though terminology differs a bit between them, Mercurial provides everything that we need for the course, and the same operations can be simulated in Git, but with a few more unexplained switches passed to the various commands, and a bit more of the works showing. There seems to be a consensus that Mercurial is a bit easier to learn, so I've decided to go with it. Everything we do with Mercurial will be applicable to any other version control system, distributed or not.

If you're already familiar with Git, then you might like to use it instead of Mercurial, so I've also provided the lab materials as a Git repostory. Begin with

git clone http://spivey.oriel.ox.ac.uk/git/compilers.git

But if you do decide to use Git, then the lab demonstrators may be less able to help you with any problems that arise.

Why do you stick to make when there's a build tool called dune specific to OCaml?

Language-specific build tools are nice, sometimes making the system discription much shorter and more flexible. On the other hand, Makefiles have been around forever, and they handle multi-language projects with no friction. There's an element of inertia too where I'm concerned.

Why stick to OCamlyacc when it's been superseded by Menhir?

Inertia must be part of the answer, but my experience has been that the extra features of Menhir are not a big help, and could become a hindrance on larger projects.

  • Menhir adds the ability to parse LR(1) grammars, rather than the slightly more restrictive class LALR(1). This makes little difference in practice.
  • Menhir has a library of macros for things like lists with delimiters between the items, whereas Yacc boringly makes us write the whole context-free grammar explicitly. This seems like an advantage for Menhir, but in truth writing out the grammar rules takes little time, and it becomes necessary anyway if we want to modify details of the grammar. For example, we can often improve the quality of error messages produced by the parser by initially allowing inputs that contain an error (such as a missing delimiter), then attaching a semantic action that prints a helpful message.
  • Menhir may have clearer error messages than OCamlyacc. I admit that might be helpful at times, but I get along quite well reading the file parser.output that gives details of Yacc's calculations, and there's sometimes no way of avoiding that level of detail.
  • Menhir gets over the irritating problem of counting the symbols on an RHS so you can write $1, $2, etc. in the semantic action. That is a good improvement, but I have to confess that I have my own version of Yacc (written in OCaml) that has its own (also non-standard) way of avoiding this age-old irritation.

Does the OCaml compiler ocamlc translate programs into native code?

OCaml comes with two compilers: ocamlopt generates native code, and ocamlc generates an OCaml-specific bytecode, with a runtime interpreter called ocamlrun. We'll stick to ocamlc for our compilers, because it will build them more quickly than ocamlopt, and the slower running speed of the resulting bytecode will be more than sufficient for our purposes. The ocamlc compiler itself is written in OCaml and built using ocamlopt, though it can be used to compile itself.

The course page says Compilers (The Farewell Tour). Does that mean you're never going to give the course again?

I have given this course many times, and the material has evolved and (I think) improved over the years. Giving it is always a delight, and I think it's wonderful that the course has made it into the core of our degree, because it's a great forum for seeing that theoretical ideas – Functional Programming itself, but also ideas from Models of Computation and Algorithms – can be applied fruitfully, and also that practical techniques like version control and systematic, automated, testing are vital in any programming project of significant size. On top of that, language implementations make wonderful student projects. In fact, it's such a delight that I think it would be selfish to keep it to myself, when so many of my colleagues could also share in the fun by becoming involved in the future.

For 2023, I've been given the opportunity to present the course again, after a gap of several years. This may (who knows?) be the last lecture course I give, so the label 'The Farewell Tour' seems apt again.

Arising from lectures – Part one: Syntax

Aren't the 'sausage machine' diagrams in Lecture 1 a bit misleading? Not all the functions shown run one after another.

You're right: there are two or three places where a white lie is told. One is that the lexer won't for us be a separate pass from the parser: it's true that the lexer produces a stream of tokens from the stream of characters it reads from the input file, but it does this by providing a subroutine that the parser can call to get the next token, not by computing the whole list of tokens in advance. Lazy evaluation, if we used it, would blur the distinction here, because a lazy list is essentially a list represented by having a subroutine to call to get the next element.

A second inaccuracy is that, once we reach intermediate code generation, using Keiko or using operator trees, the rest of the compiler operates on one procedure at a time, rather than on the whole program at once. In the Lab 4 compiler, there's a function Tgen.do_proc that takes the intermediate code for a procedure body and passes it first to the optimiser passes, then to the instruction selector, and finally outputs the assembly language. That's a minor difference, but one that saves significant space, in that intermediate code for the entire program doesn't have to be stored all at once, and also one that fixes the restriction that procedure bodies are translated largely independently of each other.

The final inaccuracy is that, in our compilers, register allocation is done during instruction selection, not after it. A bad consequence of this is that the register allocator cannot see future uses of the values that are put in registers, so sometimes chooses to put a value in the 'wrong' register, leading to a register-to-register move that could have been avoided. The advantage in this scheme comes when values have to moved out of the way before a subroutine call: that can require extra instructions to be inserted into the program in a way that's easy to arrange in this scheme, but more complicated if instruction selection is already finished before register allocation starts.

You said Keiko is implemented by an interpreter. What is the difference between an interpreter and a compiler?

An interpreter is a mechanism that takes a program and obeys it. In the case of machine-level programs, such as programs for Keiko or ARM, an interpreter can be particularly simple: it follows a cycle where it takes the next instruction of the program, carries it out by moving data around, adding, subtracting, and so on, and then moves onto the instruction that follows. Branching is implemented by changing the interpreter's idea of where the next instruction is to be found.

A compiler, on the other hand, is a program that inputs the text of another program written in one language, and outputs an equivalent program written in another language. That's different from an interpreter because the actions described by the program being translated are not carried out as part of the compiling process. It's often said that an interpreter 'translates' each line of the program it is executing each time it reaches that line, but that's not necessarily true. The interpreter does not need to translate the program at all, but just to analyse it and find out what actions it describes. In the same way, an English person who knows French need not translate each French sentence into English in order to understand it. Or perhaps more relevantly, in Searle's Chinese Room paradox, the occupants of the room follow English instructions in order to deal with the Chinese questions that come in, but need not produce an English translation of each Chinese question before answering it.

Here's a surprise: a computer is just an interpreter that's implemented in hardware and works at high speed, and that's the interpreter for ARM that we shall use. For Keiko, we use instead an interpreter written in software: there's a program that describes the behaviour of the Keiko machine that we have to compile for the host machine, and that program is used to execute the Keiko program rather than using special-purpose hardware. Because the decoding of each instruction is now done in software, the program runs perhaps 4–5 times slower than if we used machine code. The heart of the Keiko interpreter is a loop that has inside it the 'big switch', with one case for each kind of Keiko instruction; you can find this big switch in the source file interp.c that's generated as part of the build process, though the use of C macros makes it a bit hard to decipher what's going on.

Two modifications to the answer: we don't have to use an interpreter to execute Keiko code, and in fact there's a compiler that can translate Keiko into native code for the x86 or the ARM. The compiler is packaged in the same way as the Keiko interpreter, and translates each procedure in the program just before it is executed for the first time in any run of the program. In short, it is a JIT. I've not provided this implementation as part of the course materials, because it's better for our purposes to use the interpreter with its debugging facilities. But it is the standard way of running programs compiled with the Oxford Oberon Compiler.

Second, if we don't have an ARM to hand, we can instead use the 'emulator' qemu-arm. This acts like an interpreter for ARM machine code, but it too may have a JIT inside it for speed.

The translation from nondeterministic finite automata to deterministic ones may create an exponential blow-up in the number of states. Does this create a problem in practice?

Very rarely, because the regular expressions we need to describe tokens in a programming language tend to be quite simple.

In the lecture, you pointed out that the syntactic correctness of the program shown below could not be captured by a context free grammar. If so, how does the compiler check it?

var xxxx, yyyyy: integer; begin xxxx := yyyyy end.

The problem is that a context free grammar can't simultaneously express the requirements that the number of x's should match and that the number of y's should match in the two occurrences of each identifier. In theoretical terms, we are using the fact that the language xaybxayb is not context free, a fact that can be proved using the pumping lemma for context free languages (see later in the Models of Comp course). Our solution to the dilemma is not to attempt to settle questions like this in the parser, which works from a CFG, but to deal with them later. The type checker in our compiler will construct a symbol table that contains the identifiers xxxx and yyyyy, and will then check that each identifier that is used in the program appears in the symbol table.

Your description of LR parsing refers to a stack of symbols and a finite-state machine that runs over it, but books refer to a stack that contains states instead. Which is true?

To understand LR parsing, it's convenient to think as the parser's stack as containing a viable prefix, a sequence of tokens and non-terminal symbols that can be completed into a successful parse by at least one string of future tokens. We can design a finite-state machine that recognises viable prefixes, and imagine running that machine over the stack in order to decide what the parser should do next. For example, if the next input token can be added and still give a viable prefix, then shifting is a possible action.

Implementing things this way would work, but it means running the finite-state machine after each parser action, even if symbols near the bottom of the stack have not changed. What we can do instead is to record, alongside the stack of symbols, the state of the (deterministic) machine after reading each symbol on the stack. When a token is shifted, we then need to compute only one state transition for the machine; and when a right-hand side is reduced, we already have the machine state after reading the part of the stack that is unchanged, and we likewise need to compute only one state transition when the left-hand side of the production is pushed (a goto entry in the parsing table), making the process much more efficient. Now, if we're keeping the list of states, then we can deduce the symbol that is on top of the stack from the state of the machine, because each state has the property that all transitions leading into it are labelled with the same symbol. This observation applies not just at the top of the stack, but at every intermediate point, so that we can deduce the entire contents of the symbol stack from the stack of states. That means that in an implemetation there's no need to store the symbols explicitly, and just a stack of states is needed.

Is the "bottom-up parsing machine" you used in the lecture the same as the "pushdown automata" that are mentioned in Models of Computation?

They are both stack machines, though formalised slightly differently, in a way that no doubt could be hacked around. But they are used differently, in that the stack of our bottom-up machine grows to the right and holds items that have been read but not yet reduced to the start symbol; wheras the PDAs in the Model of Comp course are top-down machines with stacks that grow to the left, containing items that are expected in the input but not yet seen. Our bottom-up machine reads its input by shifting, transferring the leftmost unconsumed item of input so that it becomes the rightmost item of the stack, whereas the top-down machine reads its input by matching the top item on the stack with the next item of input and popping it. Productions play their part in the bottom-up machine when we find the RHS of a production on the stack and reduce to the LHS; in the top-down machine, when a non-terminal comes to the top of the stack, we guess which production to use and expand it, replacing the LHS with the RHS on the stack.

When made deterministic, by having a table that uses the next input token to decide which production to expand, the top-down parsing machine is able to deal with a class of grammars call LL(1) that is a proper subset of the class LR(1) that can be handled by the bottom-up machine. Because it must decide in advance what production to expand for a non-terminal A, having seen only the first token of A, these LL(1) grammars must have the property that in two productions A → α and A → β, the possible first tokens of α and β must be disjoint. This is OK for recognising statements in programming languages, which each tend to start with a specific keyword. It works less well for recognising expressions, where a binary operator (for example) only appears after its first operand has been parsed. It's usually necessary to massage the grammar for expressions to make it LL(1), then do extra work to build a tree with the desired shape. The same class of grammars is handled by the method of recursive descent, whch effectively represents the stack of the top-down machine using the subroutine stack of the host language. The kinds of transformation that are needed are illustrated by the hand-written parser in the extra/lab1q directory of the lab materials.

Arising from lectures – Part two: Expressions and statements

In the peephole optimiser, is it possible for a code sequence to yield two different results?

It's possible, and if it happens we would add additional optimiser rules to resolve the problem. The example I gave in lectures concerned code that might be generated for an array reference a[3] where a is an integer array local to a subroutine. Initial code for this reference might be the sequence

LOCAL -40
CONST 12
OFFSET
LOADW

where the array is allocated at offset -40 in the stack frame, and the constant 12 represents the distance (three 4-byte words) between the start of the array and the address where a[3] is stored. (We will shortly be adding these instructions to our repertoire in order to compile subroutines with local variables (LOCAL) and access to elements of arrays (OFFSET)). In addition to tidying up jumps and labels, the peephole optimiser can also be used to simplify calculations involving constants, and to introduce abbreviated instructions for common combinations. Thus there might be a rule

LOCAL n; CONST k; OFFSET --> LOCAL n+k

that simplifies a constant offset from a known location in the stack frame, and there might be a rule

LOCAL n; LOADW --> LDLW n

that introduces the LDLW instruction as an abbreviation. With these rules, the given sequence simplifies to the single instruction LDLW -28.

But adding an offset and loading from the resulting address is also a common operation, and Keiko has an instruction LDNW that abbreviates this:

CONST k; OFFSET; LOADW --> LDNW k

Applying this rule simplifies the given sequence to the two instructions LOCAL -40; LDNW 12, and the optimiser might get stuck there if it chose to begin with this third rule rather that the ones stated earlier. The situation can be resolved by explicitly adding the additional rule,

LOCAL n; LDNW k --> LDLW n+k,

even though it is in some sense a consequence of the other rules.

The whole problem is similar to one that arises in automated theorem proving, where the slang term is 'hacking around Knuth--Bendix'. The name refers to a classic paper by Donald Knuth and Peter Bendix entitled 'Simple word problems in universal algebras' that is probably easiest to find in Knuth's volume, Selected Papers on Design of Algorithms.

Arising from lectures – Part three: Data structures

Type metrics consist of a size and an alignment; what is the alignment for?

Typically, byte-addressed machines have restrictions about what addresses can be used for quantities that are larger than a byte. For example, there may be a rule that a 4-byte integer must be aligned so that its address is a multiple of 4. That would mean that an array of bytes could begin at any address, but an array of integers would have to be given an address that was a multiple of 4. So the type 'array 10 of integer' has size 40 and alignment 4, but the type 'array 40 of char' has the same size but alignment 1. When the compiler lays out local storage for a subroutine or lays out a record type, it must leave gaps in order to satisfy these alignment restrictions, so that a character followed by an integer would leave a gap of 3. Sometimes space could be saved by shuffling the fields so that they appear e.g. in decreasing order of alignment, but that doesn't really matter much.

Recent x86 machines support unaligned loads and stores (of, say, a 4-byte quantity at an address that is not a multiple of 4) for backwards compatibility, but it's likely that they do so inefficiently, perhaps turning a load into two loads of adjacent words followed by some shifting, masking and bitwise or-ing. Further complications are likely to arise when an unaligned access crosses the boundary between cache lines or memory pages. So it's desirable to obey some alignment restrictions even when the architecture doesn't insist on them.

On the ARM, things are worse still for unaligned memory access, because the machine traps. On the Raspberry Pi, there's a trap handler that simulates the unaligned access and then resumes the offending program, but this takes hundreds of times longer than executing a properly aligned memory access. Rumour has it that the trap handler does this (as opposed to stopping with a segfault) just so that it is possible to run one offending program: Firefox.

Arising from lectures – Part four: Procedures

Static chains seem inefficient. Are they used in practice?

Nested subroutines have had a chequered history: they were present in Algol 60 but not Fortran, inherited by Pascal but not BCPL or C, and initially absent from Java and C++. But they are natural in functional programming, and because of that have made something of a comeback in mainstream languages. There are two main approaches to providing inner subroutines with access to the parameters and locals of the outer subroutines that enclose them. One is to give them a pointer to the stack frame of the outer subroutine – a static link – and the other is to allocate a block of storage for the inner function into which the values of outer variables and parameters are copied. This second approach is often called closure conversion; it works well for functional languages, where the immutable nature of values means that the copies cannot be distinguished from the originals. It is particularly useful where functions can return their local functions as a result, for example in the standard function compose:

let compose f g = (let h x = f (g x) in h)

This has a local function h that refers to the parameters f and g of compose, and which compose returns as its result. With closure conversion, the closure built for h can embed copies of f and g, whereas with static links, the object representing h has to contain a pointer to the stack frame of compose, and that means keeping that stack frame in existence for as long as the h object survives. In short, stack frames can no longer be allocated on a stack.

For languages that do not have local functions as results, static chains are not as inefficient as they might seem. The nesting of subroutines is usually quite shallow, with perhaps only one intermediate layer between an inner subroutine and the global context, so that it is only necessary to follow at most one static link to access any variable. Added to that, on a register machine, common sub-expression elimination is likely to lead to the address of the enclosing frame for an inner subroutine being kept in a register, so that variables and parameters of the outer subroutine are accessible at known offsets from it, leading to efficient addressing. The CSE scheme in our Lab 4 compiler works well for this, at least within basic blocks.

Arising from lectures – Part five: Machine code

Where can I find out more about the 'regular tree grammars' you mentioned?

The Wikipedia page has a helpful summary with links and references. Briefly, we are concerned with a variation on Context Free Grammars where the right-hand side of each production is a well-formed tree fragment, written in our notation as something like <OFFSET, reg, <CONST n>>. There's a set of tokens (like OFFSET and CONST n here – with the argument n playing the same role as a token value in a conventional parser), and a set of variables like reg. Each production takes a variable to a tree fragment, which can either be read as a string or as the kind of tree that is described by the data type

type tree = Var of variable | Node of (token * tree list).

(Because the reading of the angle-brackety form as a tree is unambiguous, the two approaches are equivalent.) The grammar has a start symbol, which for us will generally be the variable stmt.

A regular tree grammar describes a set of trees, the ones that can be generated from the start symbol by repeatedly applying productions, in each step replacing an occurrence of the LHS of a production with the RHS of the same production, until no more variables remain. Alternatively, we can describe the same set of trees as the 'paste-ups' of derivation trees, where a derivation tree is a 'tree of trees' where each node is labelled with a production, with the children of each node corresponding in an obvious way with the variables used in the RHS of its production.

There's a normal-form theorem that says we can restrict productions to having only one constructor on the right hand side, because an RHS like <OFFSET, reg, <CONST n>> can be factored into <OFFSET, reg, foo> and a production foo → <CONST n>, where foo is a newly-invented variable. This theorem is helpful, because it helps us design a linear-time, bottom up algorithm for deciding whether a tree is generated by the grammar. We associate with each node in the tree a vector of booleans (of fixed length) saying whether the subtree rooted at the node can be generated from each of the variables in the grammar. The vector of each internal node in the tree can then be computed from the vectors for its children in a bounded time; doing this bottom-up gives the vector for every node in time linear in the number of nodes, and then we can ask whether the start symbol is marked as true in the vector for the root. This result contrasts with context-free grammars, where general parsing algorithms take cubic time, which may be reduced to \(O(N^{2.81})\) or less with some effort [Valiant, 1974]. The bottom-up algorithm can be viewed as a kind of 'bottom-up deterministic finite tree automaton'. In the case of ambiguous tree grammars (ones where some trees have more than one derivation tree), we can label each production with a cost, and use a similar bottom up algorithm to find a cheapest derivation in linear time.

We shall avoid these bottom-up techniques (for which, like conventional bottom-up parsing, a parser generator is practically indespensible) and instead use top-down greedy matching, which is demonstrably less powerful but sufficient for the situations we will face.

The code selection functions eval_reg and friends seem to be doing a lot of things at once. What is happening?

Yes, several aspects of code generation are happening simultaneously, but they are reflected in different aspects of the way these functions are defined.

  • A covering with tiles is being chosen for the operator trees that represent a procedure body. This is done in a greedy way, placing the largest possible tile at the root of each tree, and calling the code selection functions eval_reg, eval_rand, etc., recursively to reflect the structure of the tree of tiles that is chosen. When a single tile covers more than one node of the operator tree, this tree of tiles has fewer nodes than the original tree.
  • The code selection functions return fragments as their results. These fragments represent the text of (parts of) an assembly language instruction, and also allow the register allocator to discover what registers they mention, so that those registers can be freed when their values have been used and re-allocated to other purposes. There's an important function frag that looks a bit like a call to printf: it allows multiple fragments to be combined into one, with surrounding punctuation.
  • The accumulated fragments are output as assembly language instructions (or, more precisely, added to a queue waiting to be output), by the functions gen_reg and gen, one used for instructions that deliver a result in a register, and the other for instructions that have no result value. As well as queuing up an instruction for output, these functions look after the details of allocating registers. Any registers used as inputs are freed, a register is allocated for the result if needed, and the result returned by gen_reg is a fragment naming the result register. This fragment will be used again, closer to the root of the tree of tiles, when the value computed by the instruction becomes an input to another instruction. The function gen is used at the root of the operator tree, and consumes values computed by other instructions without yielding a fresh value of its own.

These functions have the structure of an almost-pure functional program, making a pattern of recursive calls that implicitly forms a tree of tiles, and passing fragments as results back to their callers. The non-functional aspect is that register allocation is happening simultenously behind the scenes, updating the state of the register allocator in an imperative way.

The prelude and postlude code for the Lab 4 compiler saves and restores all the ARM registers, even if the procedure body doesn't use them. Wouldn't it be better to save and restore only the registers that are actually used?

Yes it would be better, and that would be one of the first things to do in a project to improve the quality of the code. It would not be hard to make a partial improvement by buffering the code, monitoring which registers it uses, and generating a prelude and postlude that save and restore only those registers, provided we keep the same frame layout. That might mean leaving an empty space between the saved registers and the frame pointer, so as to keep the parameters and the static link at the same offsets, as shown in a comment in lab4/mach.ml. (Actually, the code is already buffered and output at the end of the procedure so as to allow the amount of space for outgoing parameters to be determined.)

Better still would be to eliminate the empty space, but that would introduce timing problems between phases, substantially complicating the compiler. This is particularly true with a language that allows nested procedures, with access from inner procedures to the parameters of outer procedures that enclose them. No longer would it be possible for the semantic analyser to determine the parameter offsets, so the parameter addresses would have to be represented symbolically in the optree, likely as references to the definition records for the parameters. Note that, with nested procedures, the optrees for a procedure body may contain references to the parameters of several different procedures, so it's not sufficient to implement a scheme that has the size of the frame head for the current procedure as the only unknown quantity. Instruction selection also would have to deal with unknown offsets in the frame, also very likely by embedding references to definition records in the buffered code. Then – once registers have been assigned – the offsets for parameters could be filled in before the code is output. Nested procedures could be dealt with by translating outer procedures before the inner procedures they contain, so that parameter offsets for the outer procedure have been assigned before they are needed. All this is complexity that we can well do without!

Register variables and temps are very similar in the way they are translated, and yet they are treated differently in the optree representation, with ⟨LOADW, ⟨REGVAR n⟩⟩ and ⟨STOREW, ..., ⟨REGVAR n⟩⟩ in one case but ⟨TEMP n⟩ and ⟨DEFTEMP n⟩ in the other. Why is this?

It's purely a matter of minor convenience: the intermediate code generator is a bit simpler if we pretend that a register variable has an 'address' ⟨REGVAR n⟩, though of course that address is a fake and cannot be computed as a value. Then we can use a single routine gen_addr to treat all variable references, without having to insert special cases for 'loading' or 'storing' a register variable. No such fiction is needed for temps, and omitting it avoids creating the impression that the trees output by CSE are somehow more complicated (and maybe slower to execute) than the input trees. (In determining which variables can live in registers, the semantic analyser carefully avoids variables that must have a genuine address.)

What's the difference between a RISC and a CISC machine?

For us, the vital difference is that on a RISC, transfers to and from memory are done by separate instructions from arithmetic. This means that a statement like x := x+1, where x is a local variable in the frame (say at offset 24), is translated into the sequence

ldr r1, [fp, #24]
add r1, r1, #1
str r1, [fp, #24]

with a load and store separate from the add instruction. RISC architectures tend to have arithmetic instructions that can put their result in a different register from the operands, and they tend to have a large set of uniform registers. They tend to have a single, indexed addressing mode that adds together a register and a constant, with a pure constant and simple indirection through a register as special cases. Instruction selection for a RISC machine largely consists of ensuring that address arithmetic is, where possible, done using the indexed addressing mode rather than by a separate instruction.

On a CISC machine, arithmetic and memory transfers can be combined in the same instruction, so the statement x := x+1 might be translated as

add [fp, #24], 1

a complex instruction that would have to be executed in several steps. Instruction selection for such machines is more difficult, and if expressed via tree grammars typically requires non-trivial dynamic programming, since the selection of tiles for different parts of the tree can interact in complicated ways. Question 6 on problem sheet 5 invites you to consider this by investigating a machine with a memory-to-memory move instruction.

I can't find the set instruction in any description of the ARM instruction set. Does it really exist?

In previous editions of the course, I used set r1, #12345 as an abbreviation for the PC-relative load instuction ldr r1, =12345, and provided a macro (defined in the file fixup.s) to expand the abbreviation. This has the advantage that (for example) loading the value of a global variable appears as a set instruction to establish its address, then an ldr instruction to do the actual load, which can be less confusing that two ldr instructions with different purposes.

I've now changed my mind and decided to let the ARM code look more like what appears in other courses and other books. It may be that a few occurrences of the set macro will persist in the materials, expecially in places like model answers to previous exam questions: if so, let me know and I will nuke them.

Architectures other than the ARM (and even later editions of the ARM) have other means to develop arbitrary 32-bit constants into a register, such as a pair of instructions that can load 16 constant bits into one half of a register or the other, so the use of a PC-relative load does not have any deep cosmic significance. Perhaps for that reason a set macro would be a good idea, because it could have different implementation on different variants of the ARM.

In the Lab 4 compiler, there are lots of functions fits_offset, fits_move etc. What is their purpose, and why are there so many different ones?

These functions determine whether a constant appearing in the program is small enough to fit in the immediate field of an instruction: if so, then we can generate compact, quick code, and if not, then the compiler must be prepared to generate longer code that first develops the constant into a register. For example, the expression x+1 can be translated into an add instruction containing 1 as an immediate constant:

add r2, r1, #1

But x+1234 needs the constant to be put in a register first:

set r3, #1234
add r2, r1, r3

In this case, we use the function fits_immed to decide between these alternatives.

The complication arises because different ARM instructions have different sizes of immediate fields, and different rules for interpreting them. We do need some test for constants that fit, and we could use a single test that covered all situations in a conservative way; but it's nice to do things properly, and the details are hidden in a handful of functions. It's just a matter of using the right one in each situation:

  • For load and store instructions, the test is fits_offset. The instructions use a sign and magnitude representation for the offset, so values between -4095 and 4095 (inclusive) are allowed. (Note that these limits apply to the ldr and ldrb instructions that we use; other instructions like ldsh (load signed halfword) that we don't use have smaller limits.)
  • For general arithmetic operations, the test if fits_immed. The instructions have an eight-bit immediate field, but can rotate it in the 32-bit word by any multiple of two bits; that means the values 0 to 127 are representable directly, but also 128 and 132 (etc.) as 32 << 2 and 33 << 2 – but not 129, 130 or 131. This seems fussy, but note that it does allow any power of two to be expressed, as well as bit masks for any field of up to seven or eight bits, wherever it is positioned in the word.
  • For move instructions, the test is fits_move. If fits_immed n fails, we can try fits_immed (~n) using the bitwise negation of n; if this succeeds, then the assembler will use an mvn instruction in place of the mov. (We could just generate ldr r0, =const all the time, and leave it up to the assembler to simplify this to a mov or mvn if it can – but then we lose track of the fact that mov is cheaper than ldr=.)
  • For certain add instructions involved in address calculations, the test fits_add can be used. It fits_immed n fails, we can try fits_immed (−n), and the assembler will then generate a sub instruction. (Ordinary uses of addition with a negative constant will already have been transformed into subtractions by the simplifier.)

While it's important to get details like this right in a compiler implementation, it's also important to avoid letting such details distract us from the overall strategy. Each of these tests has a role to play, but the role is isolated in one or two lines of one component of the compiler. (The architecture-specific details will never be the substance of an exam question.)

Can the whole optree <LOADW, <OFFSET, <GLOBAL _a>, <LSL, reg, <CONST 2>>>> be evaluated in a single instruction on the x86?

Yes it can: the x86 has an addressing mode that adds together a 32-bit constant with a scaled register. The syntax is

movl _a(,%eax,4), %ebx

on Linux, or mov ebx, [_a+eax*4] in Intel syntax. The machine code is seven bytes, consisting of the opcode byte, a ModR/M byte, a SIB byte and a four-byte constant. A machine grammar for x86 would probably have nonterminals addr for addresses and offset for the offset part, with productions

reg    --> <LOADW, addr>                 { mov addr, reg }
addr   --> <OFFSET, <GLOBAL x>, offset>  { x(,offset) }
addr   --> <OFFSET, reg1, offset>        { (reg1,offset) }
offset --> <LSL, reg1, <CONST n>>        { reg1,2^n } 

Better still, because the x86 can combine memory references with arithmetic, we can cover a tree such as

<PLUS, reg1, 
  <LOADW, 
    <OFFSET, <GLOBAL _a>, 
      <LSL, reg, <CONST 2>>>>>

in a single instruction, provided the register allocator can assign the same register to the first input of the addition and the result. This would be covered by rules

rand   --> <LOADW, addr>                 { addr }
reg    --> <PLUS, reg1, rand>            { ?mov reg1, reg; addl rand, reg } 

Experimenting with GCC on Linux is complicated slightly by the fact that (on some GCC versions) position-independent code is enabled by default. Use the -fno-pie flag to disable this. On amd64, the reference to the global a uses PC-relative addressing and adds an instruction, and there is a separate instruction to sign-extend the index from 32 to 64 bits.

Keiko

What is the difference between the PLUS and OFFSET instructions?

PLUS is used to add genuine integers, and OFFSET is used where we are adding a base address and an integer offset. On the targets we use, these two instructions do exactly the same thing: add together two 32-bit integer quantities. But on some targets, they might be different: for example, AMD64 is a machine where addresses are 64 bits, but offsets are 32 bits, and on that machine OFFSET will involve sign extending the offset before adding.

The instruction now called OFFSET was called PLUSA in earlier versions of the course, causing even more confusion. No doubt occurrences of PLUSA persist in a few places: please just read them as equivalent to OFFSET.

What's the difference between the instructions LOCAL n and CONST n?

LOCAL n pushes fp+n, where fp is the frame pointer; this is the address of a variable at offset n in the stack frame for the current procedure. CONST n just pushes the constant n. Actually, we could get rid of LOCAL n if we used a more primitive instruction BASE, equivalent to what we now write as LOCAL 0, that pushed the value of fp; for then we could replace LOCAL n by BASE; CONST n; OFFSET.

There are two reasons for having LOCAL and not replacing it with the simpler BASE operation, even in the last part of the course: first, that using LOCAL makes the intermediate code more compact and easier to understand; and second, that it simplifies the CSE phase, oblivious as it is to the details of instruction selection, if we let it use the heuristic that LOCAL n is a trivial expression not worth sharing.

Is the instruction for multiplying two numbers TIMES or BINOP Times?

In the textual form of Keiko instructions output by the compilers (in Labs 1–3), the binary arithmetic instructions are PLUS, MINUS, TIMES and DIV, each a single word that the Keiko assembler can look up in a table of instructions it knows about. In our compilers, we represent these four instructions as BINOP Plus, BINOP Minus, etc., using a single constructor BINOP for the type code, and giving the specific operation as an argument. The advantage is that the compiler can in suitable places treat all binary operations uniformly: for example, the peephole optimiser can have a single rule

CONST a, CONST b, BINOP wCONST (do_binop w a b)

that performs operations at compile time if their arguments are both constants.

In the Lab 4 compiler, the forms BINOP Plus, etc., do not form part of the code output by the compiler, because it's the business of the back end to translate these operations into instructions like add on the target machine. In diagnostic output, such as when the trees of intermediate code are printed, the compiler uses the form BINOP Plus.

Why does STOREW expect the address on top of the stack and the value to store underneath it? Wouldn't the opposite arrangement give a more appealing left-to-right translation of an assignment statement?

Putting the address second in the code places it next to the STOREW instruction, and allows us to write rules for the peephole optimiser that work on the address and instriction together. For example, we can write a rule that combines LOCAL n with a following STOREW to form an STLW n instruction. With a similar rule that introduced LDLW, we can then optimise code for an assignment x := y from LOCAL y; LOADW; LOCAL x; STOREW to LDLW x; STLW y. Such optimisations would be much more difficult to express if the rules has to 'reach over' intervening instructions. When we later consider the alternative intermediate representation of operator trees, we will keep the arguments of STOREW in the same order and use the tree <STOREW, <LOADW, <LOCAL y>>, <LOCAL x>>, though the tree stucture now makes it easy to marry up the STOREW operation with either of its operands, so the order of operands is not so important.

What is the number that appear in a FUNC instruction?

A FUNC instruction (or, properly speaking, a FUNC directive) has the format

FUNC name fsize

where name is the name of the procedure and fsize is the amount of space to allocate for local variables on the stack. We shall always use it to introduce each subroutine in a Keiko program. Our compilers will compute fsize by adding up the size of all the local variables of the procedure, taking alignment into account.

FUNC directives abbreviate the more general PROC directives, which as well as the frame size specify two further quantities: the maximum number of items on the evaluation stack, and a pointer map for the stack frame. Both these quantities are taken as zero in a FUNC directive, with no bad consequences because (a) our implementation of Keiko doesn't use the stack limit, and (b) it doesn't include any garbage collector that might rely on the pointer map.

You said instructions like LDLW 16 occupy a single byte: but surely there are more than 256 different instructions, so they can't all be encoded in one byte?

That's right, and the 256 bytecodes have to be doled out carefully to get the best value out of them. One key factor is that there can be multiple encodings for a single instruction. So, taking the example of LDLW, some of the 256 opcodes are allocated to common instructions like LDLW 16 and LDLW 20. Then there's an opcode that performs an LDLW with an offset that's specified in a byte that follows the opcode, and another opcode that's the same but followed by a two-byte offset, should that ever be needed. The Keiko bytecode assembler selects the most compact encoding for each instruction that appears in the program. The allocation of opcodes is specified in a file called keiko.iset that is used to generate both the assembler/linker and the bytecode interpreter. In principle, an optimal encoding could be chosen by a statistical analysis of the object code for a corpus of programs, but in practice I have chosen how many opcodes to give to each instruction by a seat-of-pants approach.

Why is TCL needed to build Keiko?

Keiko is implemented by an interpreter written in C, but parts of that interpreter and the bytecode assembler that comes with it are generated by a program written in TCL. Doing this provides some useful flexibility about instruction encodings, and ensures that the assembler and interpreter have a consistent idea about which numeric opcode goes with which instruction. Once the interpreter and assembler have been generated and compiled, TCL plays no further role in the running of Keiko programs.

Why is Keiko restricted to 32 bits?

One answer has to do with portablility of programs written in Keiko machine code: the machine code makes it apparent (e.g. in the behaviour of arithmetic on addresses) that pointers are 32 bits or 4 bytes long, and compilers that output Keiko code must exploit this fact to generate correct code. The Java Virtual Machine does not suffer from this problem, because JVM programs are expressed at a level where array indexing and similar operations are explicit, and not expressed in terms of address arithmetic. The result is that by reading a JVM program we simply cannot tell what is the size of a pointer in the underlying implementation.

Another reason why Keiko is restricted to 32-bit addresses has to do with the garbage-collected memory system that Keiko provides (though we make no use of it in the course). In order to keep track of the layout of memory, the garbage collector keeps a map of memory laid out by pages, and that 'software page table' can be implemented more easily and more efficiently if pointers are restricted to a 32-bit range. It's possible to imagine a version of Keiko with 64-bit addresses; it would be very similar to the existing Keiko, but with different characteristics that would mean that a compiler that generated code for one version could not without change be used to generate code for the other.

A number of different implementations of the Keiko machine exist, some running on 64-bit hosts, including a JIT translator for amd64 / x86_64. Most of these use 32-bit addresses for objects within the Keiko machine, by using (sadly OS-dependent) system calls to allocate memory in the lowest 2GB or so of the address space. Linux can do this by means of the MAP_32BIT flag to the mmap system call. There is another Keiko implementation, actually the one we use in the course, that uses a bytecode interpreter and interposes a software-based segmented memory scheme between the addresses used in the Keiko code and physical addresses on the host. That implementation was developed in order to make compiled Oberon programs totally portable, by using malloc as the only means of allocating memory dynamically. A music enthusiast had written an Oberon program XML2PMX that translates the XML-based file format of the music editing software Sibelius into the markup language PMX that is used with MusiXTeX, and I contributed the Keiko implementation needed to make it portable.

In calling a procedure, the static link is pushed on the stack before the PCALL instruction. Where does the dynamic link come from?

On the Keiko machine, the instructions before the call should push on the stack the following information:

  • The arguments, pushed from right to left so that the first argument is nearest the top of the stack (which grows downwards in memory) and therefore at the smallest address.
  • The static link (SL).
  • The address of the procedure (CP).

The CALL instruction refers to the procedure address but leaves the arguments and static link on the stack, and adds:

  • The return address (RA): this is the address of the next instruction after the PCALL instruction, and execution will resume there when the procedure returns.
  • The dynamic link (DL): this is the present value of the bp register, which is then reset to the value of the stack pointer.
  • Local storage for the procedure. The size of the local storage is specified in the PROC directive that begins the procedure.

The frame head is make up of SL, CP, RA, DL; two of these are contributed by the caller, and two are added by the action of the PCALL instruction.

When the procedure returns, bp is reset from the dynamic link, pc is reset from the return address, and the stack pointer sp is reset in such a way that the n parameters specified in the PCALL instruction have disappeared from the stack.

Is the procedure address used on Keiko to call a procedure actually the address of the procedure's bytecode?

No: each procedure is described by a little table that contains a pointer to the bytecode as well as the size of the local variables, a pointer map of the frame, and the values of any large constants that are used by the procedure. The procedure address is the address of this table, and the Keiko machine's cp register points to the table throughout execution of the procedure. But all that is transparent to compilers generating code for the machine. The PROC directive in the Keiko assembly language looks after all the details.

I see that labels in the intermediate code generator are allocated by calling a function label ( ), and that uses a global, assignable variable to allocate a different label each time. Couldn't we do it a bit more functionally?

Yes, we could. We could follow the Haskell style, and turn the global variable into an instance of the state monad: but that wouldn't really gain us much, just force us to write the code generator in an irritatingly cramped way, with a global variable still hidden behind the scenes. A better solution would be to delay allocation of labels until the code tree is flattened, by introducing a new constructor

type code = ... | GENLAB of (codelab -> code)

This is an idea that is sometimes called higher-order abstract syntax. Then we can write gen_stmt like this:

let rec gen_stmt = function ...
  | IfStmt (test, thenpt, elsept) ->
      GENLAB (fun lab1 -> GENLAB (fun lab2 -> GENLAB (fun lab3 ->
        SEQ [gen_cond test lab1 lab2; 
          LABEL lab1; gen_stmt thenpt; JUMP lab3;
          LABEL lab2; gen_stmt elsept; LABEL lab3])))

We then need to enhance the function canon that flattens code trees, so that given a tree GENLAB f it generates a new label and passes it to the function f, recursively flattening the tree that results. Because labels are generated in just one place, there's no longer a need for a global variable. To compile case statements and similar things, it's helpful to define a function genlabs : int → (codelab listcode) → code such that

genlabs n f = 
  GENLAB (fun x1 -> 
    GENLAB (fun x2 -> ... 
        GENLAB (fun xn -> f [x1; x2; ...; xn])...))

That is left as an exercise for the reader.

The output of using ppx -d -d a.out contains references to instructions JPROC, SLIDEW and LDKW that don't appear to be explained anywhere. What do they do?

None of these instructions are generated by our compilers, but they are introduced by the Keiko assembler pplink as a low-level representation of the action of other instructions.

The PCALLW n generated by our compilers is actually equivalent to the Keiko instruction CALLW (n+1), because the static link is effectively an extra parameter counted in the n+1. (Oberon uses a different mechanism for static links that passes them in a (virtual) register, and allows them to be elided for global procedures, and there are LINK and SAVELINK instuctions – unused by us – to support that.)

CALLW n is in turn an abbreviation for the sequence JPROC; SLIDEW n, with JPROC being the operation that activates the procedure, and SLIDEW n being the operation that removes the stack frame, containing n parameters, and pushes one (W)ord of result. SLIDE is the same, but with no result. The return address of the procedure activation points at the SLIDE or SLIDEW instruction.

CONST n (where n is a large integer) and GLOBAL x (where x is a symbol) assemble into an entry into the constant pool for the current procedure (to which the cp register invariably points), together with an LDKW k instruction, with k being the offset of the constant pool entry. CONST n for small n assembles into a PUSH n instruction that has the constant inline; there are different instruction encodings for n in the range [-1..6], and for one and two byte signed integers. The constant pool is also referenced by the LDGW x and STGW x instructions, where the assembler replaces a symbol x by a constant pool offset.

Why won't Keiko build on my Mac?

The Keiko interpreter generally needs to interact with the virtual memory system of the underlying operating system in order to allocate storage efficiently. Previously, it did this in a way that was specific to each platform, and was sometimes complicated by the clash between a 64-bit host and the 32-bit memory model that Keiko uses internally.

This year, we are using a new, portable version of the Keiko interpreter that has a layer to isolate Keiko from the virtual memory system. This isolation layer makes the bytecode run more slowly, but it is still fast enough for use in the course. This version should work on all platforms, whether 32-bit or 64-bit; its only interface to memory allocation is malloc. If it won't work for you, then please let us know and it should be simple to fix.

Labs

My compiler gives the error message "Stack overflow" when it runs. What do I do now?

Notice that this message comes from your compiler when it is translating the test program, not from the object program when you run it on the Keiko machine. Your compiler has entered an infinite recursion, and you can get a stack backtrace using the following unix command:

$ env OCAMLRUNPARAM=-b ./ppc test.p

(where test.p is the test program you were trying to run). This works provided your compiler was built using the -g flag to ocamlc. The most likely culprit is a recursive function you have just introduced into the compiler, and the likeliest of all is an off-by-one error in the recursive loop that generates code to follow the static chain.

My compiler gives an error message "generated code failed sanity check". What do I do now?

Again, this message comes from your compiler, not from the runtime system. The function Keiko.output that's supposed to output the Keiko code first applies a crude sanity check, verifying for example that your code does not try to pop non-existent values off the evaluation stack. The code generated by your compiler fails this check, probably indicating that it would crash the Keiko machine if run. Look in the assembly language file a.k to find which instruction triggered the message.

When I run the object code from my compiler, I get the message "segmentation fault". No other helpful message is printed. What do I do now?

Welcome to the wonderful world of compiler debugging, where it is more productive to read the code than to run it. The object code from your compiler is broken somehow, in a way that was not detected by the sanity checks mentioned above. You can try simplifying the test case until you find which part of it is provoking the error, but in the end the most productive thing is to read the code (found in the file a.k) and see where it fails to make sense.

How can you get the Keiko runtime system to trace instructions?

It turns out that you need to give the -d flag twice. From the Lab 2 directory:

$ ../keiko/ppx -d -d ./a.out

Each -d increments the debugging level, and the execution trace appears at level 2 or above. The output from ppx shows first a disassembly of the binary program, then a trace with two lines for each cycle, one showing the contents of the evaluation stack and the other showing the instruction that's about to be executed. There are too many values printed in hexadecimal, but at least you can follow the stream of instructions.

Where does the definition of the type of tokens come from?

Near the top of the file parser.mly – the input to ocamlyacc – appears a sequence of token definitions, like this:

%token<string> IDENT
%token COMMA
%token BEGIN END

The syntax of these definitions is determined by tradition rather than design. Ocamlyacc uses the definitons to find out what tokens can occur in the grammar, and it also outputs them in the form of an OCaml type definition in the generated file parser.mli, something like

type token =
    IDENT of string
  | ...
  | COMMA
  | ...
  | BEGIN
  | END
  | ...

so that they can be referred to in code for the lexer.

The syntax summary for Lab 1 in Figure 3.1 of the coursebook contains these productions. What do they mean?

stmts --> stmt { ";" stmt }
stmt --> IF expr THEN stmts [ ELSE stmts ] END

These abbreviations were introduced by Niklaus Wirth in a grammar notation he called Extended BNF.[1] The notation { stuff } stands for zero or more repetitions of stuff, and [ stuff ] stands for an optional occurrence of stuff. So the same syntax could be described by the productions

stmts --> stmt
stmts --> stmt ";" stmts
stmt --> IF expr THEN stmts elsepart END
elsepart --> empty
elsepart --> ELSE stmts

These abbreviations make it a bit more convenient to describe some languages, but don't allow us to describe anything that couldn't (as in this example) also be described by an ordinary context free grammar. Sadly, yacc does not support these notations, partly because it's a bit difficult to work out how to specify in a general way what the corresponding AST would be. [Wirth also introduced fixed conventions for distinguishing between terminals and non-terminals in the grammar; I have tended to use font changes for that purpose, and to omit the commas that Wirth inserts between adjacent symbols.]

In Lab 2, my compiler generates a program from case.p that prints the message illegal instruction 0. Why, why, why?

Your program probably contains a sequence like this:

CASEJUMP 3
CASEARM 1 6
CASEARM 3 6
CASEARM 5 6
CASEARM 2 7
CASEARM 6 7
CASEARM 8 8

The problem here is that the number of cases (3) in the CASEJUMP instruction does not agree with the number (6) of following CASEARM instructions, and that leads to the bytecode interpreter trying to execute part of the jump table as if it were code. I'll resist the temptation to give a blow-by-blow account of the reason why this leads to the message you saw; suffice it to say that I think you're lucky to get a message at all. (This error ought now to be caught by the sanity checks – see above.)

In Lab 4, why does this program (or a related one) segfault?

var p;

proc compose(f, g);
  proc fg(x); begin return f(g(x)) end;
begin
  return fg
end;

proc add2(x); begin return x+2 end;
proc square(x); begin return x * x end;

begin
  p := compose(square, add2);
  print p(2); newline
end.

The program forms a closure for the function fg that contains a pointer to the stack frame for compose; this closure is then returned as the result of compose, and continues to exist after the stack frame for compose has been destroyed. A subsequent attempt to call this closure follows the static link into oblivion. There is no check in the runtime system that closures contain 'sensible' addresses, so the result is a segmentation fault.

There's a similar program compose.p in the lab materials that preserves the stack frame for compose by a trick: by calling compose from a function that itself has a big enough stack frame, it arranges that in the subsequent execution of the program, the stack pointer never climbs high enough to overwrite the frame for compose, and so the static link continues to work. This is only a dirty trick, however.

How can I set up my own machine to work on the Lab exercises away from the lab?

There's a page all about that.

Trying make test2 in Lab 4 gives the message, 'Please get the correct guest_rsa file from Mike'. What should I do?

You need a copy of the file /users/mike/pi/guest_rsa from the lab machines: it is a cryptographic key that will enable you to connect to the server using ssh. The file is not included with the lab materials because they are publicly accessible. Put the file in the tools subdirectory under lab4 after cloning the repository.

$ cd lab4; cp /users/mike/pi/guest_rsa tools

If you are using your own machine, you will want to copy across this file from the Lab machines and install it in the same place; you can use scp for that:

$ cd lab4; scp myuid@ecs.ox.ac.uk:/users/mike/pi/guest_rsa tools

(replacing myuid with your used ID).

Running make test1 in Lab 4 results in a file b.out (left over from the last test) that has execute permission. Even on an Intel machine, the shell command ./b.out seems to run the program. What is happening?

It's possible to configure Linux via a mechnism called binfmt so that binaries compiled for other architectures can run via emulation. So trying to execute the ARM binary b.out results in qemu-arm being invoked behind the scenes. For present purposes, I think it's preferable to invoke qemu-arm explicitly, so we can see what's going on.

The JCASE operation in Lab 4 is implemented by a code sequence that includes an ldrlo instruction. What is that?

The optree ⟨JCASE ([lab0; lab1; lab2], lab3), e⟩ is implemented by the ARM code

cmp r0, #3
ldrlo pc, [pc, r0, LSL #2]
b lab3
.word lab0
.word lab1
.word lab2

The ldrlo instruction is (i) a conditional load instruction, (ii) based on an unsigned comparison, that (iii) puts its result in the program counter pc and (iv) uses pc-relative addressing. Let's consider those four aspects one at a time:

(i) as a conditional load instruction ldr-lo, it does something if a certain condition is satisfied, and does nothing otherwise. If the ldrlo instruction does nothing, then the next instruction, b lab3 branches to the default label lab3.

(ii) the condition under which we want to branch to a case label (and not the default) is if 0 ≤ e < 3. This condition is evaluated by using an unsigned less-than comparison between e and 3. If e is 3 or more, then the comparison is false; if e as a signed quantity is negative, then as an unsigned quantity it appears large and positive, so again the comparison is false; and it e is between 0 and 2 (inclusive) then the comparison is true.

(iii) making the ldrlo instruction put its result in the pc is equivalent to branching to whatever label from the jump table is selected. Execution proceeds from that label if the load happens.

(iv) the pc-relative addressing enables the jump table to be put just after the instructions. Because of pipelining, reading the pc on ARM yields a value that is 8 bytes after the current instruction, and that is the address of the branch table, beginning .word lab0. To this is added 4 times the value in r0 to obtain the address in the table where the relevant label value is found.

My compiler's code for some of the test cases in Lab 4 is dying with a segfault. Is it possible to use QEMU to single-step the program and find out where the problem lies?

It is possible, as I explain on another page. But generally speaking, blindly single-stepping a program of any size is likely to result in confusion rather than enlightenment. If it's an existing test case, it's probably more fruitful to look at how the code from your compiler differs from the working code stored with the test case, and make test0 exists as a convenient way of doing that. If looking at the code doesn't reveal cause of the problem, then reduction techniques – like those I showed in the last lecture of the course – are a good prelude to using the debugger, and maybe will even allow you to narrow down the problem and find the solution before you even start single-stepping.

How are strings treated in the language of Lab 4 and ppc4?

I've more-or-less followed the conventions used in Oberon. A string constant like "mike" with (say) 4 characters has type array 5 of char to allow for a terminating null character. Library functions that deal with strings make allowance for this: print_string prints up to the first null character, but will not print more characters than are in the string even if it is not null-terminated. The procedure argv that fetches a command-line argument into a character array truncates the argument if necesary so that the result is null-terminated. The procedure open_in assumes that its filename argument is null-terminated.

What is the curious directive ".section .note.GNU-stack" that appears at the end of the assembly-language output of the Lab 4 compiler?

Recent versions of the GNU linker ld give a warning if the program requires the stack segment to be executable, because that is a security risk. Each file of assembly language can say whether it needs an executable stack by including an empty section with the special name .note.GNU-stack, making it executable if the stack segment should be executable. In the absence of this indicator, the linker assumes that the program does need the stack to be executable, and issues its warning. So the directive is there to make the warning go away.

Why do some format strings have $ as the marker to substitute an argument, and others have $0, $1, etc.?

For most formatting, we use the printf function and its relatives sprintf and fprintf that are provided by the library module Print, with source code in lib/print.mli and lib/print.ml. These functions use strings as templates for the output, with arguments substituted wherever a $ sign appears. Thus, for example, if a = 2 and b = 3, then we can print the string 2 + 3 = 5 by calling printf like this:

printf "$ + $ = $" [fNum a; fNum b; fNum (a+b)].

The printf function is declared as

val printf : stringarg listunit,

and format functions like fNum : intarg specify how each value is to be converted when it is output.

What's especially interesting about this version of printf is that it's extensible. There's a function

val fMeta : stringarg listarg

defined so passing the argument fMeta fmt args to printf substitutes for a $ sign the whole string that would be printed by calling printf fmt args. This means, for example, that we can define a format function for pairs of numbers like this:

let fPair (x, y) = fMeta "($, $)" [fNum x; fNum y]

then write

printf "Treasure at $" [fPair (3, 4)]

which prints Treasure at (3, 4). This idea can be extended to print recursive structures like lists or trees by making a recursive metaformat.

These concepts are further extended in the way fragments of assembly language are represented in Lab 4. The abstract data type of fragments has two key features: one is that each fragment stores internally the list of registers it uses, and that makes it possible to build up an instruction from several fragments and allocate registers to its arguments in the proper way. Also, the main constuctor for fragments, a function frag, generalises fMeta by allowing arguments to be substituted repeatedly and in an arbitrary order: that's why the template strings contain markers $1, $2, etc., rather than plain dollar signs. For example, is v1 and v2 are fragments that denote a register (say r3) and an integer constant (say 12), then

frag "[$1, #$2]" [v1; v2]

is a fragment that denotes a compound address such as [r3, #12]. The special marker $0 denotes the result register of an instruction, and is used in calling the function gen_reg that outputs an instruction that delivers its result in a register. If v1 denotes a register containing one argument of an addition, and v2 denotes a register or small constant, then we can write

gen_reg "add $0, $1, $2" r [v1; v2]

to generate an add instruction that adds these two values and puts the result in another register r. The side-effect of calling gen_inst with these arguments is to add that instruction to a queue of instructions waiting to be output, and the result returned by gen_inst is a representation of the result register r as a fragment.

The templates passed to gen_reg can take a couple of different forms, one more specific and the other more general. In the common case where the template ends $0, $1, $2 or something similar, that suffix can be omitted, so that "add $0, $1, $2" in the example above could have been written just "add". At the other end of the scale of complexity, it's sometimes necessary to implement a single operation with multiple instructions, and in that case the instructions can be written in one template, separated by /. For example, a comparison and conditional branch might be generated by

gen "cmp $1, $2 / blt $3" [v1; v2; codelab lab]

The function gen is used here in place of gen_reg because there is no result to put in a register. The fragment codelab lab substiutes a label lab for the marked $3 in the template. A still more elaborate example sets a register to 1 or 0 depending on whether one value is less than another. This is achieved by the command

gen_reg "cmp $1, $2 / mov $0, #0 / movlt $0, #1" r [v1; v2]

This results in three instructions: first compare the two input values and set the condition codes; then unconditionally set the result register to 0; and finally set the result register conditionally to 1 if the two value compared one less than the other. Notice that the register name $0 is substituted twice, and out of order with the other arguments $1 and $2. It's important in code sequences like this that we are careful to read the input registers before writing to the output register, in case the register allocator puts the result in the same place as one of the arguments. That's the reason for putting the mov instruction to initialise the output register after the cmp instruction. The condition codes established by cmp are passed on unchanged so that the conditional move movlt can act on them.

Mostly Mercurial

What should I do with Mercurial when I've finished a lab exercise?

It's quite a good idea – while in a subdirectory like compilers/lab2 – to use hg st . to find out what files you've changed; the dot at the end of this command restricts its effect to the current directory, ignoring any changes made elsewhere in the directory tree. The listing you see will look something like this:

$ hg st .
M kgen.ml
M Makefile
? mytest.p
? kgen.ml~

meaning that files kgen.ml and Makefile have been modified, and mytest.p and the backup file kgen.ml~ are unknown to Mercurial. Your next actions might me to add the file mytest.p to Mercurial's list, then to check in your changes, giving a brief log message:

$ hg add mytest.p
$ hg ci -m "Finished Lab 2" .

Include the dot again so as to check in just the modified files in the current directory.

How can I keep my copy of the lab materials up to date?

When you've finished a lab and checked in your changes, we may ask you to update your copy of the lab materials, using the following sequence of commands:

$ hg pull
$ hg merge
$ hg ci -m 'Merged upstream changes'

The first of these commands contacts the Mercurial server and copies the updates to your clone of the repository; and the second merges the changes into your working copy of the files. The third command adds the result of the merge to your copy of the repository by making a revision with two parents: one being the latest version of your work, the other the newly-added version of the lab materials. It's always safe to pull and merge in this way, and it makes sure you have the latest version of everything.

The bookmark basis is automatically updated so that it points to the latest version of the lab materials, so that you can make a diff with the command

$ hg diff -r basis .

at any time.

I tried to check in my work, but I got a message saying abort: no username supplied (see "hg help config"). What should I do?

You need to create a file called .hgrc in your home directory containing the following two lines, with the name and e-mail address replaced by your own details.

[ui]
username = Fred Bloggs <fred@bloggs.com>

When you check in revisions, this name and address are attached to each check-in, and are subsequently shown in Mercurial's logs.

How can I use gedit to edit the commit message when I check in my work?

It's complicated. You can specify the editor you'd like to use by setting one of the environment variables HGEDITOR or VISUAL, and Mercurial will invoke that editor if you don't specify a commit message on the command line with the -m flag. But the convention is that this editor should not exit until the commit message is saved, and gedit doesn't respect this convention; if (as is likely) there is already a gedit session running, then invoking gedit as a shell command opens a new tab in the existing session and exits immediately, leading to an abort with "Commit message was empty". You could try to hack around this, but take it for all in all, the game is not worth the candle.

Vim users can set either environment variable to vim; emacs users can set it to emacsclient, a special wrapper program that pops up a new buffer inside an existing emacs session and waits for you to invoke the command C-x #.

A simpler solution is just to specify the commit message on the command line: a short phrase is enough for our purposes.

I notice you use your own Mercurial server and not BitBucketGitHub. Is there any reason for that?

Well, first, because BitBucket dropped support for Mercurial some time in 2020. But otherwise, ...

Not really, except that using GitHub might make it a bit more tempting for people to publish their own working repositories in the same place. I could disable forking, but that wouldn't stop people uploading their own repos independently, and perhaps making them public. Doing that for the lab exercises would be a shame, because it would threaten to spoil the learning experience of other current and future students taking the course. Doing so for the Christmas exercise would verge on a disciplinary offence, and I want to save people from the possibility of committing a disciplinary offence with a slip of the finger.

For the comfort of those who, even at their young age, have become Gitaholics, I've provided the same content on my own Git server at Spivey's Corner too. Synchronisation between the two repositories ought to be automatic.

The Christmas assignment

I'm a third year student. Will I have to do the Christmas assignment?

Yes you will.

Can we use the software lab machines, instead of setting up our own environment?

As far as I'm concerned, there's no reason why not, and no reason why you cannot access the machines remotely over SSH, provided you are happy with a text-based editor. But there will be no access to the building during the Christmas shutdown period, and the machines and various other pieces of infrastructure may not be reliably available for remote access. If you can't schedule a solid week's time in Oxford during the vac, I would suggest setting up your own environment, and perhaps thinking of remote access as a Plan B.

Will we have to submit our code?

No: the plan is for you to submit a report including the detail of the changes you have made and an explanation of why they satisfy the requirements. That makes the assessment process rather like a code review.

Do we have to use Mercurial?

Yes and no: just like during term, you have the option of using Git instead. But whichever you use, part of the report you submit should consist of neatly formatted and tidy difference listings showing the changes you made; you'll want a version control system to help with that.

How do I make my own test case in Lab 4?

First prepare a file test/mytest.p containing the compiler input, the expected output, and an empty marker for the assembly language:

begin print_num(42); newline() end.

(*<<
42
>>*)

(*[[
]]*)

Then say make promote-mytest. This will run the compiler and paste the object code into the test case in the right place, so that subsequent runs of make test0-mytest will show any change in the code, and so that you can print the test case for inclusion in your assignment report. (You have to put in the expected output by hand, to remove the temptation to create test cases without checking the output is actually right.)

How did you format the diff listings in the sample report?

I used hg diff to make the initial diff, then enscript -Ediffu to format it with the actual changes in bold. The output of enscript is in Postscript, so I used ps2pdf to convert it to PDF, and pdfunite to concatenate the parts of the report into one file. It's best to stick with enscript's default fixed-width font for the listings, or the lines will be indented inconsistently, nauseating the reader.

Trivia

Why is the abstract machine named Keiko?

Keiko was the name of the orca (killer whale) featured in the film Free Willy, which was housed at the Oregon Coast Aquarium when I first visited it; I was looking for a name for the machine at the time. I believe Keiko is a girl's name in Japanese, but the whale was a male. A subsequent attempt to release him into the wild ended sadly.

What font do you use in the printed materials for the course?

The fonts mostly come from the Lucida family, which is well set up for use with TeX. The body font is Lucida Bright. I couldn't resist the temptation to set programs that form the input to our compilers using an alphabet drawn by Adrian Frutiger in the early 1960s for a French book on Algol, and lately revived by Michael Sharpe as an Opentype font.

  • Some questions from the ancient past can be found in the FAQ archive.



  1. Niklaus Wirth: What can we do about the unnecessary diversity of notation for syntactic definitions? CACM, Vol. 20, Issue 11, November 1977, pp. 822–823.