[Template fetch failed for http://spivey.oriel.ox.ac.uk/corner/Template:Sitenotice?action=render: HTTP 404]

Note about programming style

From Compilers
Jump to navigation Jump to search

Previously, I generated code by providing a subroutine gen that added one Keiko instruction to the growing program, and wrote things like

let rec gen_expr =
  function
      Binop (w, e1, e2) ->
        gen_expr e1; gen_expr e2; gen (BINOP w)
    | ...

This works well enough. But if we want to follow the maxim of making our programs as functional as possible, then we are missing the opportunity to have gen_expr return the code, rather than producing it by side-effect.

That suggests we adopt this style (using @ as the OCaml equivalent of ++):

let rec gen_expr =
  function
      Binop (w, e1, e2) ->
        gen_expr e1 @ gen_expr e2 @ [BINOP w]
    | ...

But the big snag with that is all those @'s, which will make the time taken by the compiler potentially quadratic in the size of the object code. It surely doesn't matter for small programs, but it's painful to contemplate. We could get rid of the @'s by using an accumulating parameter to each code-generating function; but that adds clutter we can do without.

So what I've done instead is to return the code, but use a different representation of lists. I add a constructor SEQ to the data type of Keiko code:

type code =
    BINOP of op
  | ...

  | SEQ of code list

That allows us to take any list of code fragments and wrap them as a single fragment. Now we can write this:

let rec gen_expr =
  function
      Binop (w, e1, e2) ->
        SEQ [gen_expr e1; gen_expr e2; BINOP w]
    | ...

Instead of building a plain list of instructions, the compiler builds a tree and the list we want is the result of flattening the tree; the internal SEQ nodes are meaningless. It's easy enough to write a function canon : codecode list that eliminates these internal nodes and uses an accumulating parameter to run in linear time:

let canon t =
  let rec canona t xs =
    match t with
      SEQ ts -> foldr canona xs ts
      _ -> t:xs in
  canona t []

(except that foldr is called List.fold_right in OCaml and takes its parameters in a different order). That's just one utility function using an accumulating parameter, rather than having accumulators scattered over the whole code generator.

In practice, I've added another constructor NOP, more or less equivalent to SEQ [ ]; also, canon tidies up redundant LINE directives, keeping just the last in any group. Some of our compilers don't need to use a flat list of instructions, so sometimes it's enough just to provide a recursive routine to output the code, also easily done in linear time.