Note: I've just migrated to a different physical server to run Spivey's Corner,
with a new architecture, a new operating system, a new version of PHP, and an updated version of MediaWiki.
Please let me know if anything needs adjustment! – Mike

The nodexp tool (Compilers)

Copyright © 2024 J. M. Spivey
Jump to navigation Jump to search

The back end is based around operator trees, represented by the type

type optree = Node of inst * optree list

defined in the Keiko module. A typical tree (generated from the expression x+1) is

Node (BINOP Plus,
  [Node (LOADW, [(Node (LOCAL 24, []))]); 
    Node (CONST 1, [])]).

But a much clearer way of writing the same tree is

<BINOP Plus, <LOADW, <LOCAL 24>>, <CONST 1>>

To allow this more compact notation to be used in the compiler, there's a tool called nodexp (contained in the directory tools) that takes an OCaml program written using this special notation and outputs the same program with the compact notation expanded. The special notation can be used both in expressions that construct operator trees and in patterns that are matched against them. The Makefile for lab 4 automates the process of running the ML source of the compiler through nodexp before it is seen by the OCaml compilers.

The nodexp tool has to distinguish uses of the compact notation for optrees from places where the characters < and > are used to mean 'less than' and 'greater than'. The criterion it uses is that the opening < must be followed by an identifier with no intervening spaces. Nodexp supports an additional notation: you can write an expression like <CALL, @args> to mean an optree where the instruction is CALL and the arguments form the list args. Thus <CALL, @args> is short for the OCaml expression or pattern Node (CALL, args); in this case, the special notation is not much of an abbreviation, but it does at least have an appearance that is consistent with other expressions and patterns that deal with optrees.

Nodexp is itself implemented as an OCaml program using many of the techniques learned in the course: there is a lexer that picks out the elements of the notation from the surrounding text, and a parser built with ocamlyacc that deals with the nested structure of the notation. Acquiring the ability to build little tools like this is perhaps one of the strongest reasons to study the design and implementation of compilers.

The opdraw tool

An operator tree

The course also depends on another tool called opdraw that prepares pictures of operator trees with tiling. I used it to prepare the notes, but you won't need to use it in the labs.

The input to this tool is a small text file describing an operator tree. For example, here is the description of the tree shown in the figure at right:

tile str <STOREW,
  tile ldr u3 <LOADW, 
    <OFFSET, 
      tile set u0 <GLOBAL _a>, 
      tile lsl u2 <LSL, 
        tile ldr u1 <LOADW, <LOCAL 40>>, 
        <CONST 2>>>>,
  <LOCAL -8>>

In this description, an operator tree written in the same style as for nodexp is interspersed with labels that correspond to tiles. Thus the expression

tile ldr u3 <LOADW, ...>

corresponds to a LOADW instruction at the root of a tile. The tile is labelled with the instruction ldr, and its output edge is labelled with the register u3. Because tiles never overlap, the tile that starts here ends just before the start of those tiles that are its inputs.

That is the input to the opdraw program; its output is another piece of code, written in the language MetaPost that is compiled to Postscript, with the bits of text in the diagram typeset using TeX. Here is the MetaPost program corresponding to the example.

input opdraw
verbatimtex \input opdraw etex
begintree(1);
node1(5.0, 5.0)(btex \strut$\C{STOREW}$ etex);
node2(4.0, 4.0)(btex \strut$\C{LOADW}$ etex);
node3(4.0, 3.0)(btex \strut$\C{OFFSET}$ etex);
node4(3.0, 2.0)(btex \strut$\C{GLOBAL}\,\S{_a}$ etex);
node5(5.0, 2.0)(btex \strut$\C{LSL}$ etex);
node6(4.0, 1.0)(btex \strut$\C{LOADW}$ etex);
node7(4.0, 0.0)(btex \strut$\C{LOCAL}\,{40}$ etex);
node8(6.0, 1.0)(btex \strut$\C{CONST}\,{2}$ etex);
node9(6.0, 4.0)(btex \strut$\C{LOCAL}\,{-8}$ etex);
link.l(1, 2);
link.v(2, 3);
link.l(3, 4);
link.r(3, 5);
link.l(5, 6);
link.v(6, 7);
link.r(5, 8);
link.r(1, 9);
shade(0) oval(4);
shade(1) chain(6, 7);
shade(2) chain(5, 8);
shade(3) chain(2, 3);
shade(4) chain(1, 9);
drawtree(9);
inst.lft(1)(btex \inst{str} etex);
inst.lft(2)(btex \inst{ldr} etex);
reg.lft(2)(btex \reg{u3} etex);
inst.lft(4)(btex \inst{set} etex);
reg.lft(4)(btex \reg{u0} etex);
inst.rt(5)(btex \inst{lsl} etex);
reg.rt(5)(btex \reg{u2} etex);
inst.lft(6)(btex \inst{ldr} etex);
reg.lft(6)(btex \reg{u1} etex);
endfig;
end

The opdraw program has numbered the nodes, fixed their coordinates, and formatted the text labels. It has determined a tree of links between each node and its parent, and a collection of oval shapes for the tiles. Finally, it has attached labels to the tiles and some of the links.

The output of opdraw must be run through the MetaPost program to produce Postscript output. Standard programs that come with TeX allow this Postscript to be merged with a TeX document for printing; or there's another program MP2PDF of mine that translates the Postscript output by MetaPost into TeX input with embedded PDF fragments that can be used as input to PDFTeX. MetaPost has good facilities for solving equations and drawing curves that are used to good effect in laying out the diagram and drawing the oval shapes. (The drop shadow in the picture was added by post-editing the picture with ImageMagick.)

Like the compilers we write in the course, the opdraw program has a lexer and parser that together produce an abstract syntax tree for the expression, similar to the operator trees used in our compilers, but with added kinds of node to label the tiles. The layout of the tree is determined by a recursive process that takes into account the profile of subtrees: for example, the GLOBAL node is able to tuck in to the gap next to the LSL node because the right profile of the subtree rooted at GLOBAL is [0] and the left profile of the subtree rooted at LSL is [0, 1, 1]; so matching up the two 0's, we see that no additional space is needed between the two subtrees. If GLOBAL were given a right subtree, then its right profile would become [0, 1]; now the two 1's clash, and we must create two units of extra space between the trees by spreading apart the GLOBAL and the LSL. Such planning is easy if we have the tree structure, but very difficult without it.

The tiles are drawn with a sequence of splines, using the features of MetaPost to determine the coefficients. At present, the program can handle only "sausage-shaped" tiles that cover a single path in the tree, but other shapes could be added. I think the look of the results could be improved; but as is the way with automated tools, any improvements that are made can potentially benefit all optree drawings, both those made in the past as well as those to be made in the future. You only need to recompile!