Older project ideas
Please come and talk to me about any of these projects, or if you have a project idea of your own along the same lines. The projects are quite open-ended, so they would suit either a third year or a fourth year student, with the fourth year expecting to get further or address a wider selection of aspects. Some of the projects would divide up neatly for a team of two or more.
Some of the descriptions need to be expanded a bit, and I'll do that when time permits. But for now, if a project looks interesting, please get in touch and ask about things that are not clear. – Mike
- 1 Projects related to Oberon / Keiko
- 2 Projects related to GeomLab
- 3 Other random projects
Oberon is a very simple imperative language which was once used for teaching in Oxford (and might be used again one day). A home-grown implementation of Oberon uses Keiko, a bytecode machine implemented by both a bytecode interpreter (to support debugging and profiling) and by a rough-and-ready JIT (for better performance). The same machine is used as the target of the compilers that are studied in the Compilers course.
Better JIT translation
The existing JIT for Keiko is very simple-minded, and does little more than translate each bytecode into the corresponding machine code. Either improve the translation by using one of the many JIT libraries now available, or adjust the Oberon compiler and the specification of the bytecode machine to free it of restrictive assumptions and produce a better pure-JIT implementation.
- One idea: the compiler now knows too much about the layout of stack frames. In particular, it knows that the arguments of a procedure are on the stack at a certain offset from the frame base, effectively fixing the size of the frame head. By introducing appropriate instructions into the VM, it should be possible to make the code more independent of the exact stack layout, and that may make it possible to fit in with the native calling convention of some platforms. This should make it possible to merge the Oberon stack and the native subroutine stack, saving space and freeing up registers.
A simple object-oriented language
Use Keiko to implement a simple language that is purely object-oriented. Study the compromises that must be made to get reasonable performance, comparing your implementation with Smalltalk, Ruby or Scala.
- Almost any variation on the theme is acceptable: a different language, object-oriented or not; or a target other than Keiko. The small benefit to Mike of OOP-on-Keiko is that it could be used as an example in the Compilers course.
- A similar thing already exists, in that Oberon–2 is object-oriented (in an interesting way), and the design of Keiko was expressly made so that it would provide what's needed to implement it.
Eliminating array bounds checks
The Oberon compiler inserts code into every array access and every pointer dereference to check for runtime errors, like a subscript that is out of bounds or a pointer that is null. In many cases, it is possible to eliminate the checks because it is possible to determine from the program that no error can occur. For example, an array access inside a FOR loop may be safe given the bounds of the loop, and several uses of the same pointer in successive statements may be able to share one check that the pointer is non-null. Modify the Oberon compiler (or a simpler one taken from the Compilers labs) so that it represents the checks explicitly in its IR, and introduce a pass that removes unnecessary checks, so speeding up the code without compromising safety.
Keiko on Mindstorms
Alternative firmware for the Mindstorms NXT robot controller provides an implementation of the JVM, allowing Java programs to run on the controller, subject to some restrictions. Using this firmware as a guide, produce an interpreter for a suitable bytecode, perhaps some variant of Keiko, allowing Oberon or another robot language of your own design to run on the controller. Aim to support the buttons and display at first, and perhaps add control of the motors and sensors later.
- The NXT brick has 64K of RAM and a lot more flash memory. Programs can run directly from an image file in the flash memory (this is what LeJOS does), but that means there can be no relocation of addresses before the program starts. There's a version of Keiko that can use a program image in this way; the challenge is to make it work on the brick.
Better garbage collection for Keiko
The existing implementation of Keiko uses a stop-the-world copying collector that could cause long pauses in some applications. Modify the instruction set so poonter stores can be identifier, and enhance the garbage collector to make it incremental and perhaps generational.
Keiko JIT on the Raspberry Pi
On x86 machines, Keiko is supported by a JIT that dynamically generates native code. It uses register-based a virtual machine interface that is similar to the Risc86 machine we used as a target in the Compilers course. In order to get similar performance on an ARM-based machine like the Raspberry Pi, we need an implementation of the Risc86 interface that generates ARM code in binary form. This project will entail learning to program the ARM at the machine code level, writing a library that can assemble ARM code in the correct binary format, and designing a translation from the Risc86 operations to ARM instructions. The Keiko runtime system is written in C.
Heap-based activation records
At present, Keiko supports only conventional Pascal-like language implementations that store activation records on a stack. Experiment with an implementation where activation records are heap-allocated (and therefore recovered by a garbage collector), procedures are genuinely first-class citizens that can be returned as results in addition to being passed as arguments, and tail recursion is optimised seamlessly.
GeomLab is a very simple language, purely functional on the surface, that we use to teach schoolchildren about programming and problem solving. The current implementation of GeomLab translates functional programs first into an intermediate language, then into Java bytecodes, which may be further JITted by the underlying JVM into machine code. As it stands, the language is untyped, and run-time errors occur when (for example) a number is added to a picture.
GeomLab and Mindstorms
GeomLab has a turtle graphics feature, but the pictures are drawn only on the screen. It should be possible to make a turtle out of Lego Mindstorms, then control it with an instance of Geomlab running on a host computer, with communication over Bluetooth.
Better performance for GeomLab
At present, GeomLab programs show a performance that competes favourably with Python, making it possible to address tasks like computing images of the Mandelbrot set using a purely functional program that calls a function once for each pixel. But there is still a gap between the performance of GeomLab programs and similar ones written in Java or C, and more ambitious image-processing tasks would be made possible by better performance, particularly in the area of arithmetic. Explore ways of improving performance, perhaps including the possibility of allowing numbers to be passed around without wrapping them in heap-allocated objects, or the possibility of compiling the code for Haskell-style pattern matching in a better way.
- A promising approach is to use type inference to identify functions that always return a numeric result or a colour, and functions that expect numeric arguments. Such functions could be given multiple entry points – one for boxed and another for unboxed arguments; and functions that return numeric results can yield them unboxed and be wrapped in an adapter if a boxed result is needed.
- At present, the JIT is able to identify chains of numeric primitives, as in
x * y + z, and avoid boxing the result of the multiplication only to unbox it again for the addition. Also, the two operators are compiled inline as the appropriate JVM instructions. This proposal goes much further by allowing unboxed arguments and results for user-defined functions also.
GeomLab on Android
Produce an implementation of GeomLab's GUI and graphics library that works on the Android platform. Either use an interpreter for GeomLab's intermediate code to execute GeomLab programs, or investigate dynamic translation of the intermediate code into code for Android's virtual machine Dalvik.
- Porting the GUI from Swing to Android should be a routine programming task, though some experimentation will be needed to get an interface that works well. The graphics primitives of GeomLab already use a device-independent interface to provide output both to the screen and in Postscript, so the infrastructure is there to exploit other graphics libraries.
- The latest GeomLab implementation compiles function bodies into its own intermediate code, then optionally translates the intermediate code into JVM bytecode that is passed to the JVM. There's an interpreter for the intermediate code that gives moderate performance, good enough to do functional geometry, but perhaps not good enough from pixel-by-pixel image processing. This interpreter could be the basis for an initial implementation on Android.
- Other projects like Clojure have tried to generate code dynamically on Android, and they may provide technical information that shows what can be done.
Entering mathematical expressions and functional programs with the on-screen keyboard of a tablet or mobile is inconvenient to say the least. Devise and implement a scheme based on pop-up menus that makes this task dramatically easier, especially for the restricted language used in GeomLab.
For example, the expression
(man $ woman) & tree could be entered by starting with the display
and clicking on a succession of rules like this:
<expr> --> <picture> & <picture> giving
<picture> & <picture>
<picture> --> (<picture> $ <picture>), giving
(<picture> $ <picture>) & <picture>
Finally, three rules like
<picture> --> man to give
(man $ woman) & tree
At each stage, the human would click on one of the placeholders and be offered a list of replacements, with the list being generated dynamically from the syntactic context and at least partial type information. For example, parentheses would be included where the binding powers of operators would make them necessary, and numeric operators would not be offered when the expression must yield a picture. In the example above, the first step arguably is a combination of the two rules
<expr> --> <picture> and
<picture> --> <picture> & <picture>. Note that the arguments of
& are specified as pictures, and this restricts the choices in later stages.
The goal at each stage is to offer a list of maybe half a dozen options, sorted by likelihood. Functions added to GeomLab could be type-checked, and the types used to add rules for calling the functions. The usual
row function would lead to a rule
<picture> --> row(<num>, <picture>).
It should also be possible to undo an editing operation by pointing at a symbol that was introduced by the operation and choosing the appropriate option; likewise, to change from one rule to another with the same subtrees, such as replacing
<picture> --> <picture> & <picture> into
<picture> --> <picture> $ <picture>, maintaining the same sub-expressions.
Type-checking for GeomLab
The GeomLab language is untyped, leading to errors when expressions are evaluated that would be better caught at an earlier stage. Most GeomLab programs, however, follow relatively simple typing rules. The aim in this project is to write a polymorphic type checker for GeomLab and integrate it into the GeomLab system, which is implemented in Java. A simple implementation of the type-checker would wait until an expression is about to be evaluated, and type-check the whole program at that point. As an extension of the project, you could investigate whether it is possible to type-check function definitions one at a time, even when some of the functions they call have not yet been defined.
- An incremental approach could gather a set of constraints for each function, noticing when the constraints introduced by a new function conflict with those already present.
- The challenge (as always with type inference) is to produce reasonably helpful messages that identify the true cause of the error.
Other random projects
(Marginally related to GeomLab)
Escher was fond of making tessellations by taking a square or hexagonal grid and modifying the boundaries between cells so that each cell takes on the shape of an animal. Design a graphical editor for making such tessellations.
- The boundary of a cell can be modelled as a sequence of splines. Implement a good way to edit spline curves interactively. It should also be possible to add markings inside the outline too, with varying stroke widths.
- Make it possible to specify how the boundaries and the colours repeat.
- During editing, show a preview of the whole tessellation; either edit a separate cell and show the effect in a tessellation, or edit a cell within the tessellation directly.
- Also allow a Mobius transformation to be applied to the tessellation to produce an effect like Circle Limit. Maybe do the editing on an untransformed cell, but simultaneously show the transformed result.
- Some pictures (such as Square Limit) exploit tiles that fit next to themselves at a reduced scale.
Applications like this already exist: google "escher tessellation maker". But I've not seen one that is at all nice to use.
Compiling mathematical tables
In the good old days before computers were widespread, people used tables of logarithms and other mathematical functions for calculations. Obtaining correct and complete mathematical tables was a matter of great economic importance, not least in navigation, and in fact a major motivation for Charles Babbage's development of mechanical calculators was the need to compile tables without errors. This motivation arose again with early electronic computers, which were used to compute the tables used to aim artillery accurately at a distant target. The aim of this project is to research the mathematical methods used to compute tables in that era and earlier, and to produce computer programs that explicitly use the same methods, producing a printed result in the best typographical tradition.