# Defunctionalization

**Defunctionalization** (d17n) is a *whole program* transformation that turns higher-order functional programs into first-order ones. It's simplest to divide it into four or five stages, though it is really a single transformation.

## Contents

## A simple example

Let's use as an example a completely pointless program that uses *map* twice:

prog x ys = (map aug ys, map sqr ys) where aug y = x + y sqr y = y * y

Because d17n works on whole programs, we'll have to include the defintion of *map*:

map :: (Integer -> Integer) -> [Integer] -> [Integer] map f [] = [] map f (x:xs) = f x : map f xs

(Note that, to avoid complications, I've given *map* a type that contains no type variables, despite the fact that its definition also supports the more general type (α → β) → [α] → [β].)

The first step is to **decode the dataflow**, that is, to isolate the expressions that create or use or pass on the value of higher-order arguments or results. The simplest way of doing this is to introduce a data type that encapsulates these values, together with a trivial function *apply* that unwraps them when necessary. In our program, we want to get rid of the functional argument *f* in *map* *f* *xs*. Let's isolate that argument by introducing a data type *Func*, then rewrite *prog* so that it wraps up each functional argument as a *Func* and rewrite *map* so that it uses a new function *apply* to unwrap the functional argument:

data Func = Func (Integer -> Integer) prog x ys = (map (Func aug) ys, map (Func sqr) ys) where aug y = x + y sqr y = y * y map :: Func -> [Integer] -> [Integer] map f [] = [] map f (x:xs) = apply f x : map f xs apply :: Func -> Integer -> Integer apply (Func f) x = f x

So far *apply* is trivial; but later it will collect all the code for functions that are passed to *map*.

The second step is to **look for lambdas**, that is, to identify the functions that get wrapped up in the new data type and passed as functional arguments. I say 'look for the lambdas' because often these functions are written as lambda-expressions, though as our example shows, this need not be the case. Let's re-express our program using lambdas:

prog x ys = (map (Func (\ y -> x + y) ys, map (Func (\ y -> y * y) ys))

The point of doing this is so that we can **find the free variables** of each lambda-expression. The function (λ *y* → *y* * *y*) is always the same: it returns the square of its argument. But the function (λ *y* → *x* + *y*) depends on *x*, and we need to remember the value of *x* in order to work out what the function returns.

Now we will **create the constructors**: that is, redefine the data type *Func* to reflect the lamda-expressions we have found. We are going to represent the function (λ *y* → *y* * *y*) by a constant *Sqr*, but we are going to represent (λ *y* → *x* + *y*) by the constructed value *Aug* *x* that contains the value of *x*. Here is the appropriate data type:

data Func = Sqr | Aug Integer

We can rewrite the function *prog* to use these constructors in place of the lambda-expressions:

prog x ys = (map (Aug x) ys, map Sqr ys)

The final step is to **write the worker**: that is, to redefine *apply* because its task is no longer trivial. It now becomes a kind of interpreter for the tiny language that the *Func* data type has become. The new *apply* is defined by a set of equations, each associating one of the constructors we just introduced with the body of the lambda-expression it represents. Because the arguments of the constructor (such as *Aug* *x*) are exactly the free variables of the lambda-expression, everything matches up perfectly. In our example:

apply :: Func -> Integer -> Integer apply (Aug x) y = x + y apply Sqr y = y * y

It's important to see that d17n must be applied to a whole program at once, since creating the data type *Func* and the function *apply* require us to identify all the places where calls to *map* occur so as to capture the lambda-expressions from those call sites. Thus d17n compromises the modularity of programs, because it means that the resulting program cannot any longer be decomposed into independent pieces.

## Defunctionalizing Fun

Beginning with the interpreter for pure *Fun*, we want to eliminate some of the uses of higher-order functions. Specifically, let's try to get rid of the alternative in the type *Value* that reads

data Value = ... | Function ([Value] -> Value) | ...

We'll use a program transformation called *defunctionalization* or d17n for short. This is a *whole-program* transformation that replaces higher-order functions by first-order data structures, but also in the process makes the program less modular and more monolithic. There are about four stages:

### Decoding the dataflow

In our program, this is easy, because the values we are interested in are already wrapped in the *Function* constructor. These values are created in two places – the function *abstract* that says,

abstract xs e env = Function h where h args = eval e (defargs env xs args),

and in the function *mkprim* that produces a pair consisting of a name and a primitive function,

mkprim x = (x, Function (prim x)).

Function values are used in one place – the function *apply*, defined by

apply (Function h) args = h args.

(In general, we might need to analyse the program a bit more carefully to isolate a set of places where higher-order values are created and used. Wrapping the values in a data-type is a good way to make this information explicit.)

### Looking for lambdas

Now we'll get rid of auxiliary functions and partial applications by writing each creation site as a lambda expression. So we write

abstract xs e env = Function (\ args -> eval e (defargs env xs args))

and

mkprim x = (x, Function (\ args -> prim x args)).

The point of introducing lambdas is that it makes absolutely clear what are the free variables. In *abstract*, they are *xs*, *e* and *env*; and in *mkprim* the only free variable is *x*. Other names like *defargs*, *eval* and *prim* are globally defined, so there is no need to capture them as part of the constructed value.

### Creating the constructors

For each creation site, we want a constructor for values that packages the free variables of the lambda expression. So we will delete the alternative Function from the Value data-type, and replace it with

data Value = ... | Closure [Ident] Expr Env | Primitive String | ...

Then we redefine *abstract* so that it packages the free variables *xs*, *e* and *env* as a *Closure*:

abstract xs e env = Closure xs e env

(so *abstract* has become trivial). And we redefine *mkprim* in a similar way:

mkprim x = (x, Primitive x).

### Writing the worker

The lambda bodies that we deleted from the creation sites are collected together in a new version of *apply*, which uses the saved ingredients to carry out the same computation as before:

apply (Closure xs e env) args = eval e (defargs env xs args) apply (Primitive x) args = prim x args.

The transformation is now complete.

We have succeeded in replacing the higher-order value type with a first-order type that's much less frightening. And we've also had a good effect on the equation that defines recursion:

elab (Rec x (Lambda xs e)) env = env' where env' = define env x (Closure xs e env')

This can now be read as describing a data structure with a cycle in it, rather than some weird kind of recursive function.

Transforming away higher-order functions like this gives us a double boost of confidence in using them in writing definitional interpreters. First, we know that if we want to avoid higher-order functions for foundational reasons, we can transform them away. And second, if we want to relate our abstract interpreters to implementations, d17n gives us a way of introducing concrete data structures.