# Defunctionalization (Programming Languages)

**Defunctionalization** (d17n) is a *whole program* transformation that turns higher-order functional programs into first-order ones. It's simplest to divide it into four or five stages, though it is really a single transformation.

## Contents |

## A simple example

Let's use as an example a completely pointless program that uses *map* twice:

progxys= (mapaugys,mapsqrys)whereaugy=x+ysqry=y∗y

Because d17n works on whole programs, we'll have to include the defintion of *map*:

map:: (Integer→Integer) → [Integer] → [Integer]mapf[] = []mapf(x:xs) =fx:mapfxs

(Note that, to avoid complications, I've given *map* a type that contains no type variables, despite the fact that its definition also supports the more general type (α → β) → [α] → [β].)

The first step is to **decode the dataflow**, that is, to isolate the expressions that create or use or pass on the value of higher-order arguments or results. The simplest way of doing this is to introduce a data type that encapsulates these values, together with a trivial function *apply* that unwraps them when necessary. In our program, we want to get rid of the functional argument *f* in *map* *f* *xs*. Let's isolate that argument by introducing a data type *Func*, then rewrite *prog* so that it wraps up each functional argument as a *Func* and rewrite *map* so that it uses a new function *apply* to unwrap the functional argument:

dataFunc=Func(Integer→Integer)progxys= (map(Funcaug)ys,map(Funcsqr)ys)whereaugy=x+ysqry=y∗ymap::Func→ [Integer] → [Integer]mapf[] = []mapf(x:xs) =applyfx:mapfxsapply::Func→Integer→Integerapply(Funcf)x=fx

So far *apply* is trivial; but later it will collect all the code for functions that are passed to *map*.

The second step is to **look for lambdas**, that is, to identify the functions that get wrapped up in the new data type and passed as functional arguments. I say 'look for the lambdas' because often these functions are written as lambda-expressions, though as our example shows, this need not be the case. Let's re-express our program using lambdas:

progxys= (map(Func(λy→x+y)ys,map(Func(λy→y∗y)ys))

The point of doing this is so that we can **find the free variables** of each lambda-expression. The function (λ*y* → *y* ∗ *y*) is always the same: it returns the square of its argument. But the function (λ*y* → *x* + *y*) depends on *x*, and we need to remember the value of *x* in order to work out what the function returns.

Now we will **create the constructors**: that is, redefine the data type *Func* to reflect the lamda-expressions we have found. We are going to represent the function (λ*y* → *y* ∗ *y*) by a constant *Sqr*, but we are going to represent (λ*y* → *x* + *y*) by the constructed value *Aug* *x* that contains the value of *x*. Here is the appropriate data type:

dataFunc=Sqr|AugInteger

We can rewrite the function *prog* to use these constructors in place of the lambda-expressions:

progxys= (map(Augx)ys,mapSqrys)

The final step is to **write the worker**: that is, to redefine *apply* because its task is no longer trivial. It now becomes a kind of interpreter for the tiny language that the *Func* data type has become. The new *apply* is defined by a set of equations, each associating one of the constructors we just introduced with the body of the lambda-expression it represents. Because the arguments of the constructor (such as *Aug* *x*) are exactly the free variables of the lambda-expression, everything matches up perfectly. In our example:

apply::Func→Integer→Integerapply(Augx)y=x+yapplySqry=y∗y

It's important to see that d17n must be applied to a whole program at once, since creating the data type *Func* and the function *apply* require us to identify all the places where calls to *map* occur so as to capture the lambda-expressions from those call sites. Thus d17n compromises the modularity of programs, because it means that the resulting program cannot any longer be decomposed into independent pieces.

## Defunctionalizing Fun

Beginning with the interpreter for pure *Fun*, we want to eliminate some of the uses of higher-order functions. Specifically, let's try to get rid of the alternative in the type *Value* that reads

dataValue= ... |Function([Value] →Value) | ...

We'll use a program transformation called *defunctionalization* or d17n for short. This is a *whole-program* transformation that replaces higher-order functions by first-order data structures, but also in the process makes the program less modular and more monolithic. There are about four stages:

### Decoding the dataflow

In our program, this is easy, because the values we are interested in are already wrapped in the *Function* constructor. These values are created in two places – the function *abstract* that says,

abstractxseenv=Functionhwherehargs=evale(defargsenvxsargs),

and in the function *mkprim* that produces a pair consisting of a name and a primitive function,

mkprimx= (x,Function(primx)).

Function values are used in one place – the function *apply*, defined by

apply(Functionh)args=hargs.

(In general, we might need to analyse the program a bit more carefully to isolate a set of places where higher-order values are created and used. Wrapping the values in a data-type is a good way to make this information explicit.)

### Looking for lambdas

Now we'll get rid of auxiliary functions and partial applications by writing each creation site as a lambda expression. So we write

abstractxseenv=Function(λargs→evale(defargsenvxsargs))

and

mkprimx= (x,Function(λargs→primxargs)).

The point of introducing lambdas is that it makes absolutely clear what are the free variables. In *abstract*, they are *xs*, *e* and *env*; and in *mkprim* the only free variable is *x*. Other names like *defargs*, *eval* and *prim* are globally defined, so there is no need to capture them as part of the constructed value.

### Creating the constructors

For each creation site, we want a constructor for values that packages the free variables of the lambda expression. So we will delete the alternative Function from the Value data-type, and replace it with

dataValue= ... |Closure[Ident]ExprEnv|PrimitiveString| ...

Then we redefine *abstract* so that it packages the free variables *xs*, *e* and *env* as a *Closure*:

abstractxseenv=Closurexseenv

(so *abstract* has become trivial). And we redefine *mkprim* in a similar way:

mkprimx= (x,Primitivex).

### Writing the worker

The lambda bodies that we deleted from the creation sites are collected together in a new version of *apply*, which uses the saved ingredients to carry out the same computation as before:

apply(Closurexseenv)args=evale(defargsenvxsargs)apply(Primitivex)args=primxargs.

The transformation is now complete.

We have succeeded in replacing the higher-order value type with a first-order type that's much less frightening. And we've also had a good effect on the equation that defines recursion:

elab(Recx(Lambdaxse))env=env'whereenv'=defineenvx(Closurexseenv')

This can now be read as describing a data structure with a cycle in it, rather than some weird kind of recursive function.

Transforming away higher-order functions like this gives us a double boost of confidence in using them in writing definitional interpreters. First, we know that if we want to avoid higher-order functions for foundational reasons, we can transform them away. And second, if we want to relate our abstract interpreters to implementations, d17n gives us a way of introducing concrete data structures.