Course outline

From Compilers
Jump to: navigation, search


1. Phases and representations. Compilers can be structured as the composition of multiple transformations, with different representations of the program passed between each transformation and the next.


2. Lexical analysis. Regular expressions can be used to describe the set of tokens in the input language; lexical analysis divides the input program into meaningful tokens, discarding insignificant layout white and classifying the tokens according to their grammatical function.

[Omitted: Finite state machines.]

3. Syntax analysis. Context free grammars describe the nested structure of the programming language. An automated tool generates a parser that can consume the stream of tokens and produce an abstract syntax tree represented the syntactic structure of the input program.

[Omitted: LR parsing.]

Problem sheet 1

Expressions and statements

4. Postfix code for expressions. The abstract syntax tree guides the process of generating code for expressions. In code to evaluate expressions on a stack machine, code for the operands is followed by instructions that combine operand values according to the operators in the expression.

5. Branching code for control structures. IF and WHILE statements can be translated into code where the tests become conditional branches, again in a recursive process guided by the abstract syntax tree. Any untidiness in the resulting program can be removed by using a system of local rewrite rules in a peephole optimiser.

Problem sheet 2

Lab one: REPEAT and LOOP statements

Data structures

6. Semantic analysis. Languages with mutliple data types and declarations of variables require an additional phase, where the abstract syntax tree is annotated so that each use of a variable is labelled with the corresponding definition. Following this phase, code can be generated in a purely local way as before.

7. Address arithmetic for data structure access. The classic data types of records, arrays and pointers can be translated into code that uses address arithmetic to access elements of each kind of data structure.

Problem sheet 3

Lab two: array access.


8. Simple subroutines. Procedures without parameters need a stack of frames, with machine support for creating and destroying frames on call and return. Local variables can be allocated in the frame and addressed relative to the frame base.

9. Parameters and nested procedures. Parameters can be implemented by preparing values before the call that the linkage mechanism makes addressible in the frame. Parameters passed be reference are represented by their addresses, with an additional indirection at each access from within the procedure. Nested procedures with access to non-local variables can be implemented by linking frames into a static chain.

Extra lecture A. Introducing the ARM. Handout.

10. Higher-order functions. Procedural parameters are implemented by passing the code address and the static link that is passed when the procedure is called.

11. Compiler implementation, testing and maintenance. The technqiues described so far are sufficient to implement a compiler for a complete and usable language similar to Pascal. A well-managed compiler project relies on an automated testing scheme, where a corpus of test cases is rerun after each change to the compiler. Bug reports result in the addition of new test cases that ensure the same problem does not emerge again.

Problem sheet 4

Lab three: subroutines.

Machine code

12. The back end: operator trees. Translation to machine code can be organised in multiple phases that share a common intermediate representation. We use trees with nodes that are simple Keiko instructions and edges that show the flow of values between them.


13. Instruction selection. Machine instructions can be selected by tiling each tree with patches that correspond to instructions. The system of patches can be described, surprisingly enough, by a (highly ambiguous) context free grammar specific to the target machine.

14. Common sub-expressions. Code generated for modern-day RISC machines is substantially better if we can detect and eliminate common sub-expressions, at least in simple cases. A classic algorithm for this uses a hash table to build a DAG from the optrees, then flattens the DAG into a sequence of trees, assigning shared expressions to temps.

15. Register allocation.

Problem sheet 5

Lab four: machine code.

16. Object oriented programming.

Problem sheet 6: revision.

Practical assignment.