FAQ archive (Compilers)

From Spivey's Corner
Jump to: navigation, search

How would one go about implementing pass by value for structures (records). Also, how would it be possible to return by value a structure?

Like arrays passed by value, records can be passed by value by having the caller pretend they are being passed by reference, then making the subroutine prolog copy them into the stack frame. The Keiko machine has a FIXCOPY instruction designed especially for this purpose. Records, like arrays, cannot on typical machines be the direct result of a procedure, so compilers for languages that allow records as results typically transform the program so that the caller allocates space for the result, then passes the address of this space as an extra, hidden, parameter. The alternative, used in Java-like languages, is to say that values of array or record type are implicitly pointers to heap-allocated data, so that passing a record or array by value or returning it as a result simply passes the address of some storage with indefinite lifetime.

Why does the compiler generate non-short-circuit code for boolean expressions in statements like b := (i < max) & (a[i] = x)?

There's no deep reason, just that the implementation of gen_expr does nothing special with ANDs and ORs, so that the translation of the statement shown is

LDGW _i / LDGW _max / LT / (load a[i]) / LDGW _x / EQ / AND

where the LT, EQ, AND instructions, like PLUS, pop two values from the stack and push the result of a binary operation. The whole expression is evaluated whatever the truth value of the first comparison. More care is taken in the code generator of my Oberon compiler, which you can find on another page.

How long does the peephole optimiser take to terminate?

A preliminary question is, 'Does the peephole optimiser always terminate anyway?' Provided each rule replaces a group of instructions by a shorter group, the program shrinks at each step, and it cannot go on shrinking forever. In effect, the length of the program is a bound function for the optimiser loop.

The optimisation process is divided into scans, with each scan working forward through the program, applying rules as often as possible at each instruction before moving on to the next instruction. The only way a scan can leave possible rewrites that can be done in the next scan is if a rewrite at one point in the program enables another overlapping rewrite at a slightly earlier place in the program. For a given set of rules, we could find the length of the longest pattern, and make the peephole optimiser terminate in only one scan by moving back that far after each rule application. But it's simpler just to run another scan after any scan that makes changes, and in practice we see that the process terminates after two or three scans at most.

The story is complicated a bit by jumps, because something like

JUMP L3 / LABEL L4 ... LABEL L3 / JUMP L4

can be optimised by deleting the instruction LABEL L3 and making labels L3 and L4 equivalent. After that, it becomes apparent that the JUMP L3 instruction leads to the next instruction and can be deleted; but this would only become apparent in the next scan.

What happens if two rules match in the peephole optimiser but yield different results?

It's possible that two left-hand sides match the same instruction sequence, in which case the conventions of OCaml determine that the first rule listed is the one that's applied. But it's also possible that two rules match non-overlapping portions of the instruction sequence, and the rule that applies depends on the wrapper that systematically tries all rules at all locations in the sequence of instructions. In this circumstance, we'd rather not have a situation where the result of the optimiser depends on the detailed scheduling of the rules.

This kind of thing can happen with the instruction sequence

LOCAL 24 / CONST 12 / OFFSET / LOADW

which loads a word from an address calculated by adding the offset 12 to address 24 in the current stack frame. This instruction sequence can be made more compact by using optimiser rules to substitute other instructions. The rule

LOCAL a / CONST n / OFFSET --> LOCAL (a+n)

folds the addition of the offset into the LOCAL instruction, giving

LOCAL 36 / LOADW

Then the rule

LOCAL a / LOADW --> LDLW a

can introduce an LDLW instruction as an abbreviation for the LOCAL / LOADW pair, giving the single instruction LDLW 36.

The same instruction sequence can be rewritten in another way, because there's a rule

CONST n / OFFSET / LOADW --> LDNW n

that introduces the compact LDNW instruction, often used to extract a field from a record, given the address of the record. This yields the sequence

LOCAL 24 / LDNW 12

that cannot be simplified further, unless we add the additional rule

LOCAL a / LDNW n --> LDLW (a+n)

to deal with exactly this situation.

The whole problem is similar to one that arises in automated theorem proving, where the slang term is 'hacking around Knuth--Bendix'. The name refers to a classic paper by Donald Knuth and Peter Bendix entitled 'Simple word problems in universal algebras' that is probably easiest to find in Knuth's volume, Selected Papers on Design of Algorithms.

How does the peephole optimizer work on Keiko tree as opposed to the old way of sequences of Keiko?

For optrees, the peephole optimiser naturally splits into two parts: a jump optimiser that looks at the jumps and labels appearing at the roots of the trees, and a separate simplifier that walks over each tree separately. The jump optimiser can look for the same patterns of jumps and labels as before, and implement rules like

JNEQ a / JUMP b / LABEL a --> JEQ b / LABEL a.

The simplifier can match patterns like

<BINOP Plus, t, <CONST 0>> --> t

(Look in modules Jumpopt and Simp in lab 4.)

What is a scratch register?

A register that is used only within the translation of a small fragment of code, perhaps a single operation.

What is the base pointer (and what's the purpose of having it), if the stack pointera register @sp@ that holds the address of the most recent occupied word of the subroutine stack. On ARM, as on most recent processors, the subroutine stack grows downwards, so that the @sp@ holds the lowest address of any occupied work on the stack. points to the bottom of the stack, and the frame pointer points to the current frame?

The base pointer is the same as the frame pointer – different machines use different names. The ARM has a register fp that is used for this purpose. I don't believe I used the term 'base pointer' in this year's materials.

ldrb loads a byte into a register. But ARM registers are 32 bit and need 4 bytes? Is there an extra implicit command to splice the word into a byte? Could we optimize an array of characters by fetching 4 characters at a time?

The ldrb instruction puts the loaded byte in the least significant end of the destination register, and fills the rest with three zero bytes. Yes, you can fetch four characters at once, but then you need to shift and mask to access them individually. Several ldrb instructions will not be much slower, provided the word stays in the cache as it likely will. For well-aligned strings, you can copy them quickly by doing it a word at a time, but looking for the terminating zero byte in C style strings becomes a challenge.

Instructions are presumably stored in the RAM. But fetching from RAM takes potentially hundreds of cycles. How does the fetcher keep up with the processor's voracious appetite for instructions?

By using a cache for instructions as well as data: sometimes two separate caches, sometimes one cache that can support twice the data rate.

When we compile Keiko to ARM, is the only use of the evaluation stack (the space between fp and sp) for when we run out of registers?

No, space for local variables is allocated below fp and adressed at negative offsets from fp. Also, outgoing arguments beyond the first four are put in storage addressed at positive offsets from sp.

How on earth can a JIT compiler be faster than the Keiko virtual machine in the labs. Considering that a compiler takes a while to do the parsing, making the abstract tree, optimizing, converting to machine code, eventually run the machine code. But JIT compilers seem to be so much faster?

The JIT starts from the bytecode, which is effectively an IR, and the time to decode it for translation (once) can be compared with the time needed to decode it for execution (possibly many times). Different JITs have different degrees of ambition as regards the quality of native code they output. Mine is very simple, practically translating each bytecode individually, but the Java JITs can tune the amount of optimisation they do, sometimes retranslating code more carefully if it turns out to be heavily used.

I often get told that the stack is magically faster than the heap. Is stuff stored in the stack inherently faster to load/store from than the heap? Or does it just happen to allocated to registers more than the heap?

Access to heap storage may involve an extra indirection compared with storage in the local stack frame; but more importantly, heap-allocated storage must, in the general case, be recycled (if it is recycled at all) by garbage collection, not by the natural contraction of the subroutine stack when a subroutine returns.

How are interfaces implemented in a compiler? In classes, the methods are always going to be at a fixed offset in the vtable. But for interfaces, how do we know the correct address of the method?

Do you mean interfaces in the Java sense? If so, then there are various methods of implementing dynamic dispatch. Some of them involve hash tables, with collisions often resolved by generating code on the fly. Interface method call will always be potentially a bit more expensive than calling a method in the ordinary way.

Could you recommend a document which summarises the ARM instructions we commonly use? (Akin to the "Specification Of Keiko" document)

This set of slides is not bad: http://simplemachines.it/doc/arm_inst.pdf – provided you realise that an ordinary program runs entirely in 'user mode' and the other modes are only interesting to those implementing an operating system. So as far as we are concerned, the ARM has 16 registers plus the status flags, not 37 or whatever. Also, we won't be exploiting conditional instructions like addeq – which preforms an addition conditional on two values being equal – but only conditional branches like beq. So it's all a bit simpler than suggested in those slides.

A hastily put together guide to the subset we use can be found on another page here.

Are some of the diagrams of optrees in the notes mislabelled?

Yes, they are. In particular, the registers used in the code that is derived from a tiling may not always agree with the registers shown on the diagram. Too much done by hand to get the details right! Sorry.

How could we go about changing our compiler to instead be an interpreter?

There's a whole range of possibilities, since nearly every language implementation does some preprocessing of the program text, followed by run-time interpretation of the results of that preprocessing, so in a sense every implementation involves an interpreter to some extent.

  • For Keiko, we translate the program text into [Keiko assembly languageA symbolic representation of the machine code for a program., then that gets assembled into] bytecode, and there's a run-time interpreter for the bytecode. AFAIK, the standard implementation of Python (for example) is rather like this: the program text is internally translated into code for some kind of abstract machine that is then interpreted. The abstract machine code can be saved in a .pyc file to speed up subsequent runs.
  • For a machine language compiler, the program text is translated into [assembly language which is assembled into] binary machine code, and then there's an interpreter for machine code that's implemented in hardware.

Perhaps what's most obviously missing in this course is an example at the other end of the spectrum, where relatively little preprocessing is done, and most things are worked out at run-time. A good place to settle is with an interpreter where the run-time representation of the program is an abstract syntax tree. Happily, many examples of such interpreters are to hand, because the third year course on Principles of Programming Languages is based around a sequence of such 'source level interpreters' for different languages.

Which procedures can be called at each point in a (non-higher-order) program?

The procedures that are visible in the body of each procedure P are those that are in scope, i.e., those that are declared locally to P, those that are declared in one of the nest of procedures that enclose P, and those that are global. In a block-structured language, the scopes form a tree, where the parent of each scope is the scope that encloses it. The procedures callable from P are the children of its ancestors, including P's own children and the siblings of P.

What code can be used to 'follow static links n-m+1 times'?

Given a pointer to the base of a frame, we can fetch the static link from the frame with the sequence

CONST 12 / OFFSET / LOADW

(which can be abbreviated LDNW 12). To call a procedure that is local to the current procedure (i.e., when m = n+1, we pass as the static link the value of our own base pointer, which can be obtained with the instruction LOCAL 0. So the general picture is

LOCAL 0 / (n-m+1) times LDNW 12

As a slight optimisation, LOCAL 0 / LDNW 12 is equivalent to LDLW 12.

General

Why not generate code for the x86?

In previous iterations of this course, we used almost a sanitized subset of the x86 to good effect; and you may still find some references to this in the course materials. It was a 'subset' because there were many features of the x86 that we ignored, as most compilers do. I say 'almost' a subset because the x86 has a number of annoying restriction that we just ignored. And I say 'sanitized' because we used a cleaned-up syntax for the assembly language. This year, as an experiment, I'm using the ARM instead. It's attractive because it's a simple, modern architecture with a bigger set of registers, all uniform, and unlike our invented and sanitized subset of x86 code, there is plenty of tutorial material to be found online.

What's the difference between the instructions CALL and PCALL?

Some other languages implemented on Keiko don't have nested procedures, or don't pass the static link in the same way that we do, and they use the CALL n instruction to call procedures; it expects the procedure address and n arguments on the stack. We want to pass the static link as if it were an extra argument, so to avoid confusion I introduced the PCALL instruction, defined so that PCALL n is a synonym for CALL n+1; similarly, PCALLW n is a synonym for CALLW n+1, an instruction that calls a procedure that expects n arguments and a static link and returns a one-word result. I hope I've managed to eradicate all mention of the CALL instruction from what you read, but some mentions might leak through.

The Keiko machine, by the way, decomposes these instructions into smaller pieces – an operation JPROC that jumps to the procedure, and an operation SLIDE that tidies up the stack after the procedure has returned.

Trivia

What font do you use in the printed materials for the course?

The fonts all come from the Lucida family, which is well set up for use with TeX. The body font is Lucida Bright.

2014

New questions

Add new questions here, surrounding them with triple-single quotes

Could you recommend a document which summarises the ARM instructions we commonly use? (Akin to the "Specification Of Keiko" document)

This set of slides is not bad: http://simplemachines.it/doc/arm_inst.pdf – provided you realise that an ordinary program runs entirely in 'user mode' and the other modes are only interesting to those implementing an operating system. So as far as we are concerned, the ARM has 16 registers plus the status flags, not 37 or whatever. Also, we won't be exploiting conditional instructions like addeq – which preforms an addition conditional on two values being equal – but only conditional branches like beq. So it's all a bit simpler than suggested in those slides.

A hastily put together guide to the subset we use can be found on another page here.

Are some of the diagrams of optrees in the notes mislabelled?

Yes, they are. In particular, the registers used in the code that is derived from a tiling may not always agree with the registers shown on the diagram. Too much done by hand to get the details right! Sorry.

How could we go about changing our compiler to instead be an interpreter?

There's a whole range of possibilities, since nearly every language implementation does some preprocessing of the program text, followed by run-time interpretation of the results of that preprocessing, so in a sense every implementation involves an interpreter to some extent.

  • For Keiko, we translate the program text into [Keiko assembly language, then that gets assembled into] bytecode, and there's a run-time interpreter for the bytecode. AFAIK, the standard implementation of Python (for example) is rather like this: the program text is internally translated into code for some kind of abstract machine that is then interpreted. The abstract machine code can be saved in a .pyc file to speed up subsequent runs.
  • For a machine language compiler, the program text is translated into [assembly language which is assembled into] binary machine code, and then there's an interpreter for machine code that's implemented in hardware.

Perhaps what's most obviously missing in this course is an example at the other end of the spectrum, where relatively little preprocessing is done, and most things are worked out at run-time. A good place to settle is with an interpreter where the run-time representation of the program is an abstract syntax tree. Happily, many examples of such interpreters are to hand, because the third year course on Principles of Programming Languages is based around a sequence of such 'source level interpreters' for different languages.

What's the difference between the instructions LOCAL n and CONST n?

LOCAL n pushes bp+n, where bp is the base pointer; this is the address of a variable at offset n in the stack frame for the current procedure. CONST n just pushes the constant n. Actually, we could get rid of LOCAL n if we used a more primitive instruction BASE, equivalent to what we now write as LOCAL 0, that pushed the value of bp; for then we could replace LOCAL n by BASE / CONST n / PLUSA.

Which procedures can be called at each point in a (non-higher-order) program?

The procedures that are visible in the body of each procedure P are those that are in scope, i.e., those that are declared locally to P, those that are declared in one of the nest of procedures that enclose P, and those that are global. In a block-structured language, the scopes form a tree, where the parent of each scope is the scope that encloses it. The procedures callable from P are the children of its ancestors, including P's own children and the siblings of P.

What code can be used to 'follow static links n-m+1 times'?

Given a pointer to the base of a frame, we can fetch the static link from the frame with the sequence

CONST 12 / PLUSA / LOADW

(which can be abbreviated LDNW 12). To call a procedure that is local to the current procedure (i.e., when m = n+1, we pass as the static link the value of our own base pointer, which can be obtained with the instruction LOCAL 0. So the general picture is

LOCAL 0 / (n-m+1) times LDNW 12

As a slight optimisation, LOCAL 0 / LDNW 12 is equivalent to LDLW 12.

General

What prerequisites are needed to follow the course?

There are three courses taken by our undergraduates that provide useful background for this one. The first is Functional Programming, which is important because we will be writing compilers not as purely functional programs, but certainly as programs that use recursion over algebraic data types, and sometimes higher-order functions. I will give a brief introduction to the OCaml language we will be using in the course, but you will need to be familiar with the ideas in advance.

The second relevant course is Models of Computation, where our students learn about regular expressions, finite state automata and context-free grammars. I will aim to make the Compilers course self-contained as regards these topics, but will not repeat material from the other course unless it is directly needed in this course. Our emphasis will be on using tools that take input written as a set of regular expressions or a context free grammar and automatically produce a scanner or parser. We will use these tools mostly as black boxes rather than concerning ourselves with how they work inside.

The third course is Digital Systems, where students will already have met simple algorithms for compiling arithmetic expressions to machine code, and also the general organisation of computer hardware and systems software. Though this course is largely self-contained, that material provides useful background. In particular, the target architecture we will use in the last part of the course is the ARM, and that is different from the MIPS that was discussed in Digital Systems, so I will assume no detailed knowledge of its programming model. Those who know the MIPS or another architecture will be able to recognise the similarities and the differences.

Will we be writing compilers in Oberon?

No, we'll use OCaml as our implementation language.

Will we be writing compilers for Oberon?

Well, sort of. We'll work towards a compiler that implements a simple, Pascal-like language that looks a bit like Oberon. It's natural to do so, because Pascal-like languages are easy to implement with machine-level operations, and lack the irregularities that can make a nightmare of compiling, say, C. Many of the techniques we learn can be applied to implementing other languages. And rest assured, the keywords will be in lower case!

The O in OCaml stands for Objective, doesn't it? Will we be using object-oriented ideas in the course? And why won't we be using Java?

Yes, OCaml includes object-oriented features of a slightly strange kind, but we won't be using them. Generally speaking, the facilities for defining data types in object-oriented languages are heavy and clumsy compared with functional programming; for example, what can be achieved by an algebraic type definition in half a dozen lines in Haskell or OCaml might take several pages in Java. And a function that is defined in a few lines using pattern matching and recursion might become a tedious application of the 'visitor pattern' that involves several classes spread over many pages.

Will we actually be generating native code?

There will be two parts to the course. At first, we will target a stack machine, Keiko, that is like the Java Virtual Machine (JVM) but with instructions at a slightly lower level. Like the JVM, the Keiko machine has instructions that mostly consist of a single byte each, so the code is called bytecode. There are two different implementations of the Keiko machine:

  • an interpreter for the bytecode that has options to print each instruction as it is executed.
  • a JIT translatorA compiler that is embedded in the execution environment for a virtual machine, and translates code for the virtual machine into native machine code just before each procedure is called. After translation, the native code remains in memory and can be used if the same procedure is called again, but the code is thrown away when the program terminates. Either each procedure (or some other fragment of the program) is translated when it is called for the first time, or an interpreter and a translator co-exist, and procedures are translated if they have been called more than a few times. that compiles the bytecode into native code for the x86 just before it is executed for the first time.

The bytecode interpreter is sufficient for debugging the compilers we write, but it's nice to have the JIT implementation too.

So what about the second part of the course?

Glad you asked. In the second part of the course, the Keiko machine is used again as an intermediate code, that is, as the interface between the front end of the compiler, where source programs are translated into low-level operations, and the back end, where we will implement these low-level operations using machine instructions. The machine will be (a subset of) the ARM, the same processor that is used in the Raspberry Pi. This makes it possible to run our code on a real piece of silicon.

Why not generate code for the x86?

In previous iterations of this course, we used almost a sanitized subset of the x86 to good effect; and you may still find some references to this in the course materials. It was a 'subset' because there were many features of the x86 that we ignored, as most compilers do. I say 'almost' a subset because the x86 has a number of annoying restriction that we just ignored. And I say 'sanitized' because we used a cleaned-up syntax for the assembly language. This year, as an experiment, I'm using the ARM instead. It's attractive because it's a simple, modern architecture with a bigger set of registers, all uniform, and unlike our invented and sanitized subset of x86 code, there is plenty of tutorial material to be found online.

Why do you use Mercurial for the labs? Why not Git? After all, Linus says that Git is better.

Mercurial and Git share the facility to make convenient copies of an entire repository and to synchronise changes between multiple copies of the same repository; that's what the word distributed means in the term distributed version control. (Actually, none of these products provides a truly distributed system, in that the people using them give explicit instructions to move data around the network; a distributed implementation would provide the abstraction of a single, unified history of development without relying on a central server.) Both products have the same basic data structure: a rooted, directed acyclic graph of revisions with operations to merge any two revisions by reference to their most recent shared ancestor. Though terminology differs a bit between them, Mercurial provides everything that we need for the course, and the same operations can be simulated in Git, but with a few more unexplained switches passed to the various commands, and a bit more of the works showing. There seems to be a consensus that Mercurial is a bit easier to learn, so I've decided to go with it. Everything we do with Mercurial will be applicable to any other version control system, distributed or not. I have used SCCS, RCS, CVS and Subversion in my time, and am quite happy with Subversion for much my own work; ironically, as the internet has improved, the need for a distributed system has become less pressing, and I can reach my central server from most places I visit; since the central server is regularly and reliably backed up, it gives added protection from losing work.

Arising from lectures

What's the difference between a RISC and a CISC machine?

For us, the vital difference is that on a RISC(Reduced Instruction Set Computer) A style of computer design where there are multiple, identical registers, arithmetic instructions that operate between registers, and separate load and store instructions with a limited set of addressing modes., transfers to and from memory are done by separate instructions from arithmetic. This means that a statement like x := x+1, where x is a local variable in the frame (say at offset 24), is translated into the sequence

ldr r1, [fp, #24]
add r1, r1, #1
str r1, [fp, #24]

with a load and store separate from the add instruction. RISC architectures tend to have arithmetic instructions that can put their result in a different resgister from the operands, and they tend to have a large set of uniform registers. They tend to have a single, indexed addressing mode that adds together a register and a constant, with a pure constant and simple indirection through a register as special cases. Instruction selection for a RISC machineA computer designed with a simplified instruction set. Typically, these machines have a large set of uniform registers, a small set of addressing modes, and load/store instructions separate from the instructions that carry out arithmetic operations. largely consists of ensuring that address arithmetic is, where possible, done using the indexed addressing mode rather than by a separate instruction.

On a CISC machine, arithmetic and memory transfers can be combined in the same instruction, so the statement x := x+1 might be translated as

add [fp, #24], 1

a complex instruction that would have to be executed in several steps. Instruction selection for such machines is more difficult, and if expressed via tree grammars typically requires non-trivial dynamic programming, since the selection of tiles for different parts of the tree can interact in complicated ways. Question 6 on problem sheet 4 invites you to consider this by investigating a machine with a memory-to-memory move instruction.

What's the difference between the PLUS and PLUSA instructions on Keiko?

The PLUS instruction pops two integers on the stack, adds them together, and pushes the result. The PLUSA instruction expects first an address and then an integer offsest to be in the stack (with the integer at the top) and also adds them together. For the Keiko bytecode interpreter, there's no difference in behaviour, although the sanity checks on Keiko code that are incorporated in the lab materials do enforce the distinction. It's a distinction that does become important when we translate the Keiko code into native code for a register machine, because occurrences of PLUSA signal opportunities to use the indexed addressing mode of the target machine for address calculations. Depending on the target, we might not wish to use the addressing hardware for other additions: for example, it might not behave the same way with respect to arithmetic overflow.

What's the difference between the instructions CALL and PCALL?

Some other languages implemented on Keiko don't have nested provedures, or don't pass the static link in the same way that we do, and they use the CALL n instruction to call procedures; it expects the procedure address and n arguments on the stack. We want to pass the static link as if it were an extra argument, so to avoid confusion I introduced the PCALL instruction, defined so that PCALL n is a synonym for CALL n+1; similarly, PCALLW n is a synonym for CALLW n+1, an instruction that calls a procedure that expects n arguments and a static link and returns a one-word result. I hope I've managed to eradicate all mention of the CALL instruction from what you read, but some mentions might leak through.

The Keiko machine, by the way, decomposes these instructions into smaller pieces – an operation JPROC that jumps to the procedure, and an operation SLIDE that tidies up the stack after the procedure has returned.

Labs

My compiler gives the error message Stack overflow when it runs. What do I do now?

Notice that this message comes from your compiler when it is translating the test program, not from the object program when you run it on the Keiko machine. Your compiler has entered an infinite recursion, and you can get a stack backtrace using the following unix command:

$ env OCAMLRUNPARAM=-b ./ppc test.p

(where test.p is the test program you were trying to run). This works provided your compiler was built using the -g flag to ocamlc, introduced in the 2013 edition of the lab materials. The most likely culprit is a recursive function you have just introduced into the compiler, and the likeliest of all is an off-by-one error in the recursive loop that generates code to follow the static chain.

My compiler gives an error message generated code failed sanity check. What do I do now?

Again, this message comes from your compiler, not from the runtime system. The function Keiko.output that's supposed to output the Keiko code first applies a crude sanity check, verifying for example that your code does not try to pop non-existent values off the evaluation stack. The code generated by your compiler fails this check, probably indicating that it would crash the Keiko machine if run. Look in the assembly language file a.k to find which instruction triggered the message.

When I run the object code from my compiler, I get the message segmentation fault. No other helpful message is printed. What do I do now?

Welcome to the wonderful world of compiler debugging, where it is more productive to read the code than to run it. The object code from your compiler is broken somehow, in a way that was not detected by the sanity checks mentioned above. You can try simplifying the test case until you find which part of it is provoking the error, but in the end the most productive thing is to read the code (found in the file a.k) and see where it fails to make sense.

How can you get the Keiko runtime system to trace instructions?

It turns out that you need to give the -d flag twice. From the Lab 2 directory:

$ ../keiko/ppx -d -d ./a.out

Each -d increments the debugging level, and the execution trace appears at level 2 or above. The output from ppx shows first a disassembly of the binary program, then a trace with two lines for each cycle, one showing the contents of the evaluation stack and the other showing the instruction that's about to be executed. There are too many values printed in hexadecimal, but at least you can follow the stream of instructions.

In Lab 2, my compiler generates a program from case.p that prints the message illegal instruction 0. Why, why, why?

Your program probably contains a sequence like this:

CASEJUMP 3
CASEARM 1 6
CASEARM 3 6
CASEARM 5 6
CASEARM 2 7
CASEARM 6 7
CASEARM 8 8

The problem here is that the number of cases (3) in the CASEJUMP instruction does not agree with the number (6) of following CASEARM instructions, and that leads to the bytecode interpreter trying is execute part of the jump table as if it were code. I'll resist the temptation to give a blow-by-blow account of the reason why this leads to the message you saw; suffice it to say that I think you're lucky to get a message at all. (This error ought now to be caught by the sanity checks – see above.)

In Lab 4, why does this program (or a related one) segfault?

var p;

proc compose(f, g);
  proc fg(x); begin return f(g(x)) end;
begin
  return fg
end;

proc add2(x); begin return x+2 end;
proc square(x); begin return x * x end;

begin
  p := compose(square, add2);
  print p(2); newline
end.

The program forms a closure for the function fg that contains a pointer to the stack frame for compose; this closure is then returned as the result of compose, and continues to exist after the stack frame for compose has been destroyed. A subsequent attempt to call this closure follows the static link into oblivion. There is no check in the runtime system that closures contain 'sensible' addresses, so the result is a segmentation fault.

There's a similar program compose.p in the lab materials that preserves the stack frame for compose by a trick: by calling compose from a function that itself has a big enough stack frame, it arranges that in the subsequent execution of the program, the stack pointer never climbs high enough to overwrite the frame for compose, and so the static link continues to work. This is only a dirty trick, however.

Mostly Mercurial

What should I do with Mercurial when I've finished a lab exercise?

It's quite a good idea – while in a subdirectory like compilers/lab2 – to use hg st . to find out what files you've changed; the dot at the end of this command restricts its effect to the current directory, ignoring any changes made elsewhere in the directory tree. The listing you see will look something like this:

$ hg st .
M kgen.ml
M Makefile
? mytest.p
? kgen.ml~

meaning that files kgen.ml and Makefile have been modified, and mytest.p and the backup file kgen.ml~ are unknown to Mercurial. Your next actions might me to add the file mytest.p to Mercurial's list, then to check in your changes, giving a brief log message:

$ hg add mytest.p
$ hg ci -m "Finished Lab 2" .

Include the dot again so as to check in just the modified files in the current directory.

How can I keep my copy of the lab materials up to date?

When you've finished a lab and checked in your changes, we may ask you to update your copy of the lab materials, using the following sequence of commands:

$ hg pull
$ hg merge
$ hg ci -m 'Merged upstream changes'

The first of these commands contacts the Mercurial server and copies the updates to your clone of the repository; and the second merges the changes into your working copy of the files. The third command adds the result of the merge to your copy of the repository by making a revision with two parents: one being the latest version of your work, the other the newly-added version of the lab materials.

It's always safe to pull and merge in this way, and it makes sure you have the latest version of everything. In particular, I'm about to add to the file .hgignore listing files that are ignored by Mercurial so that backup files like kgen.ml~ are no longer shown. I'll try to minimise changes in the upstream repository that may interfere with your work, but I am in the process of putting in more comments in the example compilers ppc and ppc2 that are included in the lab materials

The bookmark basis is automatically updated so that it points to the latest version of the lab materials, so that you can make a diff with the command

$ hg diff -r basis .

at any time.

I tried to check in my work, but I got a message saying abort: no username supplied (see "hg help config"). What should I do?

You need to create a file called .hgrc in your home directory containing the following two lines, with the name and e-mail address replaced by your own details.

[ui]
username = Fred Bloggs <fred@bloggs.com>

When you check in revisions, this name and address are attached to each check-in, and are subsequently in Mercurial's logs.

How can I use gedit to edit the commit message when I check in my work?

It's complicated. You can specify the editor you'd like to use by setting one of the environment variables HGEDITOR or VISUAL, and Mercurial will invoke that editor if you don't specify a commit message on the command line with the -m flag. But the convention is that this editor should not exit until the commit message is saved, and gedit doesn't respect this convention; if (as is likely) there is already a gedit session running, then invoking gedit as a shell command opens a new tab in the existing session and exits immediately, leading to an abort with "Commit message was empty". You could try to hack around this, but take it for all in all, the game is not worth the candle.

Vim users can set either environment variable to vim; emacs users can set it to emacsclient, a special wrapper program that pops up a new buffer inside an existing emacs session and waits for you to invoke the command C-x #.

A simpler solution is just to specify the commit message on the command line: a short phrase is enough for our purposes.

Trivia

What font do you use in the printed materials for the course?

The fonts all come from the Lucida family, which is well set up for use with TeX. The body font is Lucida Bright.


2013

General

How did the cleaned-up version of the x86 differ from the real machine?

A good example of the cleaning up we did is that it's possible to access bits 0 to 7 of the EAX register of the x86 as an 8-bit quantity, as in the instruction mov al, x that stores these 8 bits in the location x. But it's not possible to do the same with the low-order 8 bits of some other registers, such as ESI, that are otherwise equivalent to EAX. We just ignored that problem.

If you ignore restrictions like that, doesn't it make it pure luck whether your object code will work?

If we simply ignored them, then yes. But what we did was to use a special assembler that plasters over the cracks. In the language of this assembler, you can write,

stb %0, x

to store the low-order byte of register %0 (equivalent to EAX), and you can write

stb %5, x

to do the same thing taking the value from register %5 (equivalent to ESI) instead. The assembler inserts some extra code to get round the restrictions in the processor, but we won't need to be concerned with that. Actually, the special assembler is made up of a translation pass in front of the ordinary assembler for the x86, and that made it possible to see the actual x86 code if you really want.

Arising from lectures

How can we prove whether a grammar is LR(0) or SLR or LALR or LR(1)?

A grammar is LR(0) if each state of the LR(0) automaton that permits a reduction (i.e. has an item with the dot at the right) contains only that reduction – so that neither a shift nor any other reduction is possible.

A grammar is SLR if, in each state that permits one or more reductions, the decision whether to shift or reduce (or the decision which reduction to perform) can be made on the basis of FOLLOW sets. That is, if a state contains the item [S α .] then we compare FOLLOW(S) with the set of tokens that can be shifted in the state (and also with the FOLLOW set of other possible reductions). If these sets are disjoint, then the grammar is SLR.

A grammar is LALR if a similar condition holds, based not on FOLLOW sets but on lookahead sets, which are always subsets of the FOLLOW sets. I'm not telling you how to compute lookahead sets, and I won't be asking you to classify grammars as LALR or not, unless they are either SLR or not LR(1).

A grammar is LR(1) if it has a correct LR(1) parsing table. A grammar is not LR(1) if it is amibguous. And a grammar also fails to be LR(1) if there is any viable prefix and lookahead token where different completions of the sentence require different immediate actions. To prove that a grammar is not LR(1), find two sentences that share a common prefix, but require different actions at the end of that prefix, despite the stack contents and the lookahead token being the same. For example, if you can say, "When you see an identifier, you can't tell whether it's a continuation of this expression or the start of a new one," then that's a sign the grammar is not LR(1).

Are conditions evaluated with the short-circuit method even if they appear in parentheses?

Yes: we can be sure of that, because parentheses are used by the parser to determine the shape of the abstract syntax tree for an expression, but don't appear explicitly in the tree itself. So adding superfluous parentheses to an expression does not affect the abstract syntax tree that is built, and thus cannot affect the code our compiler generates for it.


Type metrics consist of a size and an alignment; what is the alignment for?

Typically, byte-addressed machines have restrictions about what addresses can be used for quantities that are larger than a byte. For example, there may be a rule that a 4-byte integer must be aligned so that its address is a multiple of 4. That would mean that an array of bytes could begin at any address, but an array of integers would have to be given an address that was a multiple of 4. So the type 'array 10 of integer' has size 40 and alignment 4, but the type 'array 40 of char' has the same size but alignment 1. When the compiler lays out local storage for a subroutine or lays out a record type, it must leave gaps in order to satisfy these alignment restrictions, so that a character followed by an integer would leave a gap of 3. Sometimes space could be saved by shuffling the fields so that they appear e.g. in decreasing order of alignment, but that doesn't really matter much.

Recent x86 machines support unaligned loads and stores (of, say, a 4-byte quantity at an address that is not a multiple of 4) for backwards compatibility, but it's likely that they do so inefficiently, perhaps turning a load into two loads of adjacent words followed by some shifting and bitwise or-ing. So it's desirable to obey some alignment restrictions even when the architecture doesn't insist on them.

What's all that stuff about minus signs in integer constants?

If you don't make the minus sign in -2147483648 part of a single token with the digits, then the token 2147483648 is an integer constant whose value doesn't fit in 32 bits. In practice, this need not be a problem, because the result of - 2147483648 will be correct modulo 232 and therefore equal to -2147483648. But if in theory your language doesn't allow constants that don't fit in 32 bits, then you will be led to make a complicated exception for this case. Such an exception appears in Section 3.10.1 of the Java LanguageIn the theory of syntax, a set of strings over a specified alphabet. Specification. None of this is important, except as an example of the 1001 details that need to be got right in language and compiler design in order that ordinary programmers don't need to worry about them.

And what was that about typedef names in C?

In a C program, usually x * y will be interpreted by the compiler as multiplying two variables x and y. An expression can appear on its own as a statement: in this case writing

x * y;

as a statement all by itself is useless, but the rule is there to allow you to write a subroutine call as a statement, for example. So it's a fact that x * y; is a syntactically valid statement. But after the type definition

typedef int x;

what was syntactically a statement becomes a declaration: x * y; now declares y as having type pointer-to-integer. The upshot is that the parser must not treat all identifiers as interchangeable, but must rely on the lexer to classify each identifier as an ordinary identifier or as a typedef name that introduces a declaration. Because the set of typedef names can change, this requires an interaction between the parser and the lexer that is not envisaged in our simple data flow. The problem can be solved, but it's a bit messy; there's not really any reason to design a language to require this kind of messy trick in the compiler, and it's a symptom of the fact C evolved over a lengthy period.

Labs

Why does this program in the language of Lab 2 loop forever?

repeat x := x+1 until true

Congratulations on winning the prize for Bug of the month!

Perhaps the word true appears in magenta in your editor, as if it were a language keyword. But that's just because the editor is assuming your program, stored in a file with extension .p, is written in Pascal. In our language, true is not a keyword but an ordinary variable name. In Lab 2's language, variables don't have to be declared, and they have an initial value of 0, which is interpeted as Boolean false. Thus your program loops until the variable true has a non-zero value, which it never will.

(Actually, true is not a reserved word in Pascal but a predefined identifier. That makes it legal for a Pascal program to contain the declaration const true = false if you want.)

2012

Arising from lectures

How does a semantic analyser deal with programming languages that have generic types but ones that promise type-safety at compile time?

Sadly, there is no time in the bijou lecture courses we teach in Oxford to cover such things. The archetype for languages with generics is ML with its parametric polymorphism, or (if you want pararmetized modules) Standard ML. Luca Cardelli's tutorial paper 'Basic Polymorphic Typechecking' (google it) is very helpful. – Mike

When implementing vtables, why do private methods need to be in the vtable? As far as I'm aware, they can't be overridden or called by subclasses and so it would be perfectly fine to just call them directly without looking them up.

Depending on the language, if you can prove that a method won't be overridden in a subclass, then calls to it can be compiled statically. That's the purpose of the 'final' label in Java and the way that methods are only optionally 'virtual' in C++. I guess in Java that private methods may as well be treated as final by the compiler, but I haven't thought about it. You could try compiling and disassembling some sample programs to find out what the Java compiler does; but I suspect it leaves the decision to the JIT, which knows more than it can about the whole program. – Mike

Was the cost-vector method of instruction selection based on a technique from a book somewhere that I could look up? It bears some resemblance to the method presented in the dragon book, except that uses an array of length N, and works out costs for all i from 0 to N based on using i of the N registers, your cost-vector version seems to use costs based on registers, addresses and constants.

I don't have the book with me at the moment, but I believe you can see the same method used in Andrew Appel's book Modern Compiler Implementation. Actually, we covered both the Dragon Book algorithm for optimal evaluation of simple expressions with a limited number of registers, and also the tree grammar based instruction selection algorithm, where we assumed a plentiful supply of registers. Both use dynamic programming, but to solve different problems.

Labs

In your lectures, you say that LDGW _x is equivalent to CONST _x / LOADW, but the code for lab 3 seems to use GLOBAL _x / LOADW instead. What's going on?

I decided at some stage that it was simpler to have separate instructions, CONST for numeric constants and GLOBAL for pushing an absolute address, but I didn't maintain concentration for long enough to make the change consistently. There's no deep issue here, because both instructions have the effect of pushing a global address onto the stack. – Mike

Why do I get this error message when I try to build Keiko?

gcc -g -O2 -m32 -fno-strict-aliasing -Wall  -c gc0.c -o gc0.o
gc0.c:35:36: error: unknown type name ‘bool’
make: *** [gc0.o] Error 1

For a short time, there was an error in the source for gc0, with bool appearing in place of boolean. I've corrected that now in the Subversion database, so you can download a better version by giving the command svn up:

$ svn up
U keiko/Makefile
U keiko/gc0.c
$ cd keiko
$ make

And everything should be fine. You may see more files being updated than just Makefile and gc0.c, because I installed the latest version of Keiko, modified so that it is written in the common subset of C and C++.

2011

Arising from lectures

Wouldn't a stack of hash tables be a reasonable implementation of a compiler's symbol table?

Yes, it would. Hash tables, though, are an inherently imperative data structure, because adding an element to a hash table changes the table, and the only feasible way round this is to copy the whole table first, losing whatever gains in efficiency were made by using a hash table in the first place. Having a stack of hash tables helps to make things a bit less imperative, because at least then there's a separate representation of each scope. But the operation of adding a definition remains one that works by side-effect.

I prefer to use an interface for the dictionary or symbol table (call it what you like) that is purely applicative, i.e., doesn't have operations that change a hidden state of an environment, because that allows the symbol table module to satisfy a simpler kind of specification, in which environments are more like values than like (updatable) objects. Another confusing thing in typical compiler books is the confusion that's introduced between a definition of a name and its existence in the symbol table. It helps immensely with clarity to say that each declaration creates a definition, and the semantic analyser annotates the tree with these definitions, and they continue to exist as annotations even after the semantic analysis is finished. If the compiler is written (as all modern compilers should be) in a language with dynamic storage allocation, there's no need to worry about allocating and managing storage for these definitions. (I almost wrote 'garbage collection' rather than 'dynamic allocation' there, but the truth is that many things in compilers work perfectly well without garbage collection: the compiler runs, it uses the storage it needs, and then all the storage is reclaimed at the end when compilation is over.)

This isn't a data structures course, so we don't need to discuss what would be the best data structure for representing these 'persistent' environments, except perhaps to satisfy ourselves that a reasonably efficient implementation exists. So it's enough to say 'use a binary search tree' and possibly 'use some kind of balanced tree'.

There's nothing wrong with using hash tables; it's just that doing so places a constraint on the design of the compiler, and makes us view semantic analysis more as a process and less as a calculation. That said, I'm hugely keen on using one big hash table (TM) to represent all identifiers by distinct integers inside the compiler, so that the only string comparisons are done in the lexer. That doesn't affect the structure of the rest of the compiler.

Is it true that in C you can't have more than two levels of pointers?

I can't find any trace of this rule. GCC with the -ansi flag happily compiles the program,

typedef int **********ptr;

and the grammar given in the back of K&R seems to allow it. (Though to be fair it allows many things that don't make any sense as C programs.)

On the other hand, recursive types are possible in C only via the use of structure tags. The magic spell I always use looks like

typedef struct node *ptr;

struct node {
     int data;
     ptr next;
};

and it's plain that C uses name equivalence for structures, inventing structure tags (like node here) if they are not written in the source code. So the answer to the name vs structural equivalence question is clear. – Mike

See also this StackOverflow answer – Callum Rogers

Labs

Should the program sumpow.p in Lab 4 print 1300 or 979? I'm confused!

So was I, and the version delivered with the lab materials is inconsistent with what I wrote in the lab manual: one prints 1300 = 1^5 + 2^5 + 3^5 + 4^5, and the other prints 979 = 1^4 + 2^4 + ... + 5^4. I'm about to correct this by changing the lab materials, and afterwards svn up will get you the corrected progam; but whatever version you have, the correct answer is given in the comment embedded in the file sumpow.p, and that is the data used by make test.

According to the oracle,

Sumk4.gif

and

Sumk5.gif

So putting n = 5 in one we get 5 * 6 * 11 * 89 / 30 = 979, and putting n = 4 in the other we get 16 * 25 * 39 / 12 = 1300.

If, despite the fact that the above error was corrected long ago, you're still getting 1300 as the answer, it's because you are pushing the parameters of sumpow in the wrong order.

I'm trying to build the Keiko bytecode interpreter for Windows using MinGW, and it doesn't know about the function posix_memalign that's used in gc0.c. Is it important, or would malloc do instead?

For the bytecode interpreter (no JIT), it doesn't matter. In the JIT, the function scratch_alloc is used to allocate pages of memory that will subsequently be made executable, and that requires the memory to be aligned on a page boundary.

In OBC itself, the semi-portable function posix_memalign isn't used; for unix machines, I use mmap to allocate memory, and on Windows I use VirtualAlloc. See the file gc.c in the OBC sources.

OCaml

What's going on here? Defining the function g seems to change the type of r.

mike@staysail:~$ ocaml
        Objective Caml version 3.12.0

# let r = ref [];;
val r : '_a list ref = {contents = []}
# let g x = r := [x+1];;
val g : int -> unit = <fun>
# r;;
- : int list ref = {contents = []}

The type '_a is not a polymorphic type variable; it's an unknown type, and all uses of r must treat it consistently. Such types are necessary when type inference meets imperative constructs, otherwise it becomes possible to set a variable like r to a list of integers and then use it as if it held a list of booleans. Defining the function g imposes the constraint that r must hold lists of integers, and that's reflected in the type that's later displayed for r.

Trivia

I like to use spaces in the names of directories, but when I do this, the compile script in the labs breaks: it refuses to run the linker. And even after I've fixed that problem, the compile script creates an a.out that won't run. Isn't this a serious bug?

It certainly is a show-stopper, especially for people who are used to My Documents, My Music (for mp3's downloaded from file-sharing sites), and My Pictures (don't ask), and perhaps want to add My Little Pony too. You can fix the problem by replacing the compile script with this one:

#!/bin/sh

KEIKO=`cd ../keiko; pwd`

set -x

./ppc $1 >a.k \
    && "$KEIKO/pplink" -custom -nostdlib -i "/usr/bin/env $KEIKO/ppx" \
        "$KEIKO/lib.k" a.k -o a.out >/dev/null \
    && chmod +x a.out

I won't update the lab materials, because this script depends on an icky feature of #! lines on Linux – that the tail of the line is passed as a single argument – and I'd rather keep things clear and simple when I can. – Mike 18:42, 1 February 2011 (UTC)

Personal tools

Variants
Actions
Navigation
Tools