Laboratory exercises (Compilers)

Copyright © 2024 J. M. Spivey
Jump to navigation Jump to search

There are four lab exercises for the course:

  1. Implement control structures in a flowchart language.
  2. Add array access to a typed language
  3. Implement procedure calls, nested procedures and higher-order functions.
  4. Extend a code generator for the ARM to better exploit addressing modes.

Some of the exercises have optional parts that you may like to complete. It's more important, however, that you do at least the non-optional parts of all the labs, including the last one.

The exercises

The lab exercises are described in Chapters 3, 5, 7 and 9 of the coursebook. Listings of the chief modules you will need to change are provided in Appendix E of the book. The lab materials are divided into several subdirectories:

keiko Assembler and bytecode interpreter for the Keiko abstract machine.
lab1 Source code for Lab 1: control structures.
lab2 Source code for Lab 2: arrays.
lab3 Source code for Lab 3: procedures.
lab4 Source code for Lab 4: machine code.
lib Extra library modules used in all our compilers.
 – growvect Extensible arrays.
 – print Formatted output à la Mike.
 – source Tracking and printing lines from the source code.
 – util Useful functions missing from the standard library.
ppc4 A complete compiler that translates the language of Lab 4 into Keiko bytecode.
tools The nodexp tool used to implement optree syntax, plus scripts for running ARM code on a remote server, and for improving error messages from the OCaml compiler.

The materials are delivered using an anonymous Mercurial server in a way described in the instructions for Lab 1. You can also browse the materials using the URL

https://spivey.oriel.ox.ac.uk/hg/compilers

and there's a cheat sheet listing the commands you'll need to use.

Please don't post on GitHub. Please note that the lab materials are protected by copyright, and more importantly it does no favours to future students if the answers become publicly available. What's more, a public clone is not likely (once you've lost interest in the course) to share any bugfixes made to the materials. It's for these reasons than I don't host the lab materials on GitHub, because there one natural way of working begins with making a public clone of the repository. It's not necessary to publish clone repositories in order to do the lab exercises, and I ask you, please, not to do so. – Mike.

Quick start

Opne a shell window and use it to clone the Mercurial repository containing the materials by using the command,

$ hg clone http://spivey.oriel.ox.ac.uk/hg/compilers

This will make a directory called compilers containing all the materials.

Now change to the subdirectory compilers/keiko and build the interpreter for the Keiko machine:

$ (cd compilers/keiko; make)

(The parentheses here make the change of directory local to the command.)

Second, change to the subidrectory compilers/lib and build some utility modules that are common to all our compilers:

$ (cd compilers/lib; make)

Next, you can change to the subdirectory for one of the labs and build the initial version of it:

$ cd compilers/lab1
$ make

Finally, you can run regression tests on the resulting compiler:

$ make test

You'll find the first test gcd.p already passes. But the second test uses features that the lab asks you to implement, so it will fail until you've done the lab exercise. When your work is done, all tests will pass.

You are free to use whatever software you prefer for editing and for building and testing programs, but it's best to use Make with the Makefiles provided, and to choose an editing platform that is able to interpret error messages in the standard format and take you to the relevant line of the source code. I've provided project files for Geany, a lightweight IDE, that you can use if you like. Another page gives instructions for setting up the project files and starting to use Geany.

Naturally enough, the computers in the Software Lab have been set up with all the software that is needed to do these exercises. If you want to use your own machine, that is perfectly possible, and another page gives suggestions for setting up the software you need. You will probably wish to do this well before the end of term so that you are set up for the Christmas assignment.

Problems and remedies

Problems marked "Fixed in rev n:1a2b34" can be resolved by pulling from the software repository then updating your working copy. Instructions for doing this are given on the Mercurial cheat sheet.

001 The ocamlc compiler reports garbled source text in error messages for Lab 4.
There's a bug in ocamlc that affects programs that are built using a text-based preprocessor like our nodexp program: after a compile-time error, the fragment of the source file that is displayed becomes garbled. I've tried to work round this by providing a wrapper tools/ocamlwrap that invokes the ocamlc in a way that disables the compiler's own display of source fragments, and instead inserts its own snippets. This wrapper is used in the Makefile each lab exercise.

Some little tools

  • Lab four makes use of a preprocessor called nodexp that allows us to write operator trees with the concise syntax
<LOADW, <LOCAL 8>>
instead of
Node (LOADW, [Node (LOCAL 8, [])]).
The preprocessor is implemented using a lexer and parser written with ocamllex and ocamlyacc.
  • The diagrams of optrees in the lecture notes and on this site are generated with another software tool called opdraw that accepts an input syntax similar to nodexp and compiles it into the Metapost graphics language.
  • Also in Lab four, there is a script that allows programs in ARM assembly language to be sent to a remote Raspberry Pi, where they are assembled, linked and executed.

Solutions

What might go wrong?

Some hints about how to deal with error messages that might arise are written in the Lab exercises section of the FAQ page. Some more explicit hints:

  • In Lab 1, it's important to keep straight the difference between the number of arms in a case statement and the total number of labels. Where several labels are attached to the same arm, these numbers will be different. In the CASEJUMP instruction that the compiler generates, we need one CASEARM item in the list for each label, and the number of items given in the CASEJUMP instruction itself should agree with this. Getting this wrong would lead to a segmentation fault at runtime or an illegal instruction, but I have added a sanity check to the Keiko module that outputs code so that the error is caught earlier.
  • In Lab 2, it's important to get the scaling of subscripts right. Typical errors would include failing to multiply subscripts for an integer array by 4, and later failing to replace the 4 by 1 when subscripting a Boolean array.
  • In Lab 3, there's a certain temptation when writing code to follow the static link to write a recursion in OCaml that does not terminate – I think because of an off-by one error. The result of this is a stack overflow in the compiler.
  • In Lab 4, the use of the text-to-text preprocessor nodexp (which implements the <BINOP, t1, t2> notation for operator trees) provokes a bug in the OCaml compiler that leads to garbled error messages when it tries to show which source line is at fault. To get round this, there's a wrapper program ocamlwrap (written in TCL, sorry) that is silently invoked from the Makefile. It runs the OCaml compiler in such a way that it doesn't try to show the source line following an error, and then the wrapper steps in and shows the source line itself. All being well, you won't notice this, but participants may become puzzled if the mechanism goes wrong, or if they try to use their own Linux machines without installing TCL.

Please free to add to this list.