Undergraduate and M.Sc. projects

From Spivey's Corner

Jump to: navigation, search

Please come and talk to me about any of these projects, or if you have a project idea of your own along the same lines. The projects are quite open-ended, so they would suit either a third year or a fourth year student, with the fourth year expecting to get further or address a wider selection of aspects. Some of the projects would divide up neatly for a team of two or more.

Some of the descriptions need to be expanded a bit, and I'll do that when time permits. But for now, if a project looks interesting, please get in touch and ask about things that are not clear. – Mike

Contents

Projects related to Oberon / Keiko

Oberon is a very simple imperative language which was once used for teaching in Oxford (and might be used again one day). A home-grown implementation of Oberon uses Keiko, a bytecode machine implemented by both a bytecode interpreter (to support debugging and profiling) and by a rough-and-ready JIT (for better performance). The same machine is used as the target of the compilers that are studied in the Compilers course.

Better JIT translation

The existing JIT for Keiko is very simple-minded, and does little more than translate each bytecode into the corresponding machine code. Either improve the translation by using one of the many JIT libraries now available, or adjust the Oberon compiler and the specification of the bytecode machine to free it of restrictive assumptions and produce a better pure-JIT implementation.

  • One idea: the compiler now knows too much about the layout of stack frames. In particular, it knows that the arguments of a procedure are on the stack at a certain offset from the frame base, effectively fixing the size of the frame head. By introducing appropriate instructions into the VM, it should be possible to make the code more independent of the exact stack layout, and that may make it possible to fit in with the native calling convention of some platforms. This should make it possible to merge the Oberon stack and the native subroutine stack, saving space and freeing up registers.

A simple object-oriented language

Use Keiko to implement a simple language that is purely object-oriented. Study the compromises that must be made to get reasonable performance, comparing your implementation with Smalltalk, Ruby or Scala.

  • Almost any variation on the theme is acceptable: a different language, object-oriented or not; or a target other than Keiko. The small benefit to Mike of OOP-on-Keiko is that it could be used as an example in the Compilers course.
  • A similar thing already exists, in that Oberon–2 is object-oriented (in an interesting way), and the design of Keiko was expressly made so that it would provide what's needed to implement it.

Eliminating array bounds checks

The Oberon compiler inserts code into every array access and every pointer dereference to check for runtime errors, like a subscript that is out of bounds or a pointer that is null. In many cases, it is possible to eliminate the checks because it is possible to determine from the program that no error can occur. For example, an array access inside a FOR loop may be safe given the bounds of the loop, and several uses of the same pointer in successive statements may be able to share one check that the pointer is non-null. Modify the Oberon compiler (or a simpler one taken from the Compilers labs) so that it represents the checks explicitly in its IR, and introduce a pass that removes unnecessary checks, so speeding up the code without compromising safety.

Keiko on Mindstorms

Alternative firmware for the Mindstorms NXT robot controller provides an implementation of the JVM, allowing Java programs to run on the controller, subject to some restrictions. Using this firmware as a guide, produce an interpreter for a suitable bytecode, perhaps some variant of Keiko, allowing Oberon or another robot language of your own design to run on the controller. Aim to support the buttons and display at first, and perhaps add control of the motors and sensors later.

  • The NXT brick has 64K of RAM and a lot more flash memory. Programs can run directly from an image file in the flash memory (this is what LeJOS does), but that means there can be no relocation of addresses before the program starts. There's a version of Keiko that can use a program image in this way; the challenge is to make it work on the brick.
  • LeJOS has a rather nice garbage collector with incremental marking. This helps to avoid long pauses that interfere with real-time control of the robot. It lacks pointer maps of the stack, so must take a partly conservative approach. The existing implementation of Keiko, on the other hand, uses a copying collector that assumes there's plenty of RAM, but does reliably identify all pointers in the stack. Garbage collection for Pascal-like languages is made harder by the possibility of pointers to the middle of an object. It would be good to implement a suitable garbage collector for Keiko on Mindstorms – but many programs could be made to work without one.

Heap-based activation records

At present, Keiko supports only conventional Pascal-like language implementations that store activation records on a stack. Experiment with an implementation where activation records are heap-allocated (and therefore recovered by a garbage collector), procedures are genuinely first-class citizens that can be returned as results in addition to being passed as arguments, and tail recursion is optimised seamlessly.

Projects related to GeomLab

GeomLab is a very simple language, purely functional on the surface, that we use to teach schoolchildren about programming and problem solving. The current implementation of GeomLab translates functional programs first into an intermediate language, then into Java bytecodes, which may be further JITted by the underlying JVM into machine code. As it stands, the language is untyped, and run-time errors occur when (for example) a number is added to a picture.

GeomLab and Mindstorms

GeomLab has a turtle graphics feature, but the pictures are drawn only on the screen. It should be possible to make a turtle out of Lego Mindstorms, then control it with an instance of Geomlab running on a host computer, with communication over Bluetooth.

Better performance for GeomLab

At present, GeomLab programs show a performance that competes favourably with Python, making it possible to address tasks like computing images of the Mandelbrot set using a purely functional program that calls a function once for each pixel. But there is still a gap between the performance of GeomLab programs and similar ones written in Java or C, and more ambitious image-processing tasks would be made possible by better performance, particularly in the area of arithmetic. Explore ways of improving performance, perhaps including the possibility of improving the performance of GeomLab by allowing numbers to be passed around without wrapping them in heap-allocated objects, or the possibility of compiling the code for Haskell-style pattern matching in a better way.

  • A promising approach is to use type inference to identify functions that always return a numeric result or a colour, and functions that expect numeric arguments. Such functions could be given multiple entry points – one for boxed and another for unboxed arguments; and functions that return numeric results can yield them unboxed and be wrapped in an adapter if a boxed result is needed.
  • At present, the JIT is able to identify chains of numeric primitives, as in x * y + z, and avoid boxing the result of the multiplication only to unbox it again for the addition. Also, the two operators are compiled inline as the appropriate JVM instructions. This proposal goes much further by allowing unboxed arguments and results for user-defined functions also.

GeomLab on Android

Produce an implementation of GeomLab's GUI and graphics library that works on the Android platform. Either use an interpreter for GeomLab's intermediate code to execute GeomLab programs, or investigate dynamic translation of the intermediate code into code for Android's virtual machine Dalvik.

  • Porting the GUI from Swing to Android should be a routine programming task, though some experimentation will be needed to get an interface that works well. The graphics primitives of GeomLab already use a device-independent interface to provide output both to the screen and in Postscript, so the infrastructure is there to exploit other graphics libraries.
  • The latest GeomLab implementation compiles function bodies into its own intermediate code, then optionally translates the intermediate code into JVM bytecode that is passed to the JVM. There's an interpreter for the intermediate code that gives moderate performance, good enough to do functional geometry, but perhaps not good enough from pixel-by-pixel image processing. This interpreter could be the basis for an initial implementation on Android.
  • Other projects like Clojure have tried to generate code dynamically on Android, and they may provide technical information that shows what can be done.

Type-checking for GeomLab

The GeomLab language is untyped, leading to errors when expressions are evaluated that would be better caught at an earlier stage. Most GeomLab programs, however, follow relatively simple typing rules. The aim in this project is to write a polymorphic type checker for GeomLab and integrate it into the GeomLab system, which is implemented in Java. A simple implementation of the type-checker would wait until an expression is about to be evaluated, and type-check the whole program at that point. As an extension of the project, you could investigate whether it is possible to type-check function definitions one at a time, even when some of the functions they call have not yet been defined.

  • An incremental approach could gather a set of constraints for each function, noticing when the constraints introduced by a new function conflict with those already present.
  • The challenge (as always with type inference) is to produce reasonably helpful messages that identify the true cause of the error.

Compiling mathematical tables

In the good old days before computers were widespread, people used tables of logarithms and other mathematical functions for calculations. Obtaining correct and complete mathematical tables was a matter of great economic importance, not least in navigation, and in fact a major motivation for Charles Babbage's development of mechanical calculators was the need to compile tables without errors. This motivation arose again with early electronic computers, which were used to compute the tables used to aim artillery accurately at a distant target. The aim of this project is to research the mathematical methods used to compute tables in that era and earlier, and to produce computer programs that explicitly use the same methods, producing a printed result in the best typographical tradition.

Navigation
Toolbox