# Defunctionalization (Programming Languages)

**Defunctionalization** (d17n) is a *whole program* transformation that turns higher-order functional programs into first-order ones. It's simplest to divide it into four or five stages, though it is really a single transformation.

## Contents

## A simple example

Let's use as an example a completely pointless program that uses **map** twice:

progxys= (mapaugys,mapsqrys)whereaugy=x+ysqry=y*y

Because d17n works on whole programs, we'll have to include the defintion of **map**:

map:: (Integer->Integer) -> [Integer] -> [Integer]mapf[] = []mapf(x:xs) =fx:mapfxs

(Note that, to avoid complications, I've given **map** a type that contains no type variables, despite the fact that its definition also supports the more general type (α -> β) -> [α] -> [β].)

The first step is to **decode the dataflow**, that is, to isolate the expressions that create or use or pass on the value of higher-order arguments or results. The simplest way of doing this is to introduce a data type that encapsulates these values, together with a trivial function **apply** that unwraps them when necessary. In our program, we want to get rid of the functional argument **f** in **map** **f** **xs**. Let's isolate that argument by introducing a data type **Func**, then rewrite **prog** so that it wraps up each functional argument as a **Func** and rewrite **map** so that it uses a new function **apply** to unwrap the functional argument:

dataFunc=Func(Integer->Integer)progxys= (map(Funcaug)ys,map(Funcsqr)ys)whereaugy=x+ysqry=y*ymap::Func-> [Integer] -> [Integer]mapf[] = []mapf(x:xs) =applyfx:mapfxsapply::Func->Integer->Integerapply(Funcf)x=fx

So far **apply** is trivial; but later it will collect all the code for functions that are passed to **map**.

The second step is to **look for lambdas**, that is, to identify the functions that get wrapped up in the new data type and passed as functional arguments. I say 'look for the lambdas' because often these functions are written as lambda-expressions, though as our example shows, this need not be the case. Let's re-express our program using lambdas:

progxys= (map(Func(\y->x+y)ys,map(Func(\y->y*y)ys))

The point of doing this is so that we can **find the free variables** of each lambda-expression. The function (\ **y** -> **y** * **y**) is always the same: it returns the square of its argument. But the function (\ **y** -> **x** + **y**) depends on **x**, and we need to remember the value of **x** in order to work out what the function returns.

Now we will **create the constructors**: that is, redefine the data type **Func** to reflect the lamda-expressions we have found. We are going to represent the function (\ **y** -> **y** * **y**) by a constant **Sqr**, but we are going to represent (\ **y** -> **x** + **y**) by the constructed value **Aug** **x** that contains the value of **x**. Here is the appropriate data type:

dataFunc=Sqr|AugInteger

We can rewrite the function **prog** to use these constructors in place of the lambda-expressions:

progxys= (map(Augx)ys,mapSqrys)

The final step is to **write the worker**: that is, to redefine **apply** because its task is no longer trivial. It now becomes a kind of interpreter for the tiny language that the **Func** data type has become. The new **apply** is defined by a set of equations, each associating one of the constructors we just introduced with the body of the lambda-expression it represents. Because the arguments of the constructor (such as **Aug** **x**) are exactly the free variables of the lambda-expression, everything matches up perfectly. In our example:

apply::Func->Integer->Integerapply(Augx)y=x+yapplySqry=y*y

It's important to see that d17n must be applied to a whole program at once, since creating the data type **Func** and the function **apply** require us to identify all the places where calls to **map** occur so as to capture the lambda-expressions from those call sites. Thus d17n compromises the modularity of programs, because it means that the resulting program cannot any longer be decomposed into independent pieces.

## Defunctionalizing Fun

Beginning with the interpreter for pure *Fun*, we want to eliminate some of the uses of higher-order functions. Specifically, let's try to get rid of the alternative in the type **Value** that reads

dataValue= ... |Function([Value] ->Value) | ...

We'll use a program transformation called *defunctionalization* or d17n for short. This is a *whole-program* transformation that replaces higher-order functions by first-order data structures, but also in the process makes the program less modular and more monolithic. There are about four stages:

### Decoding the dataflow

In our program, this is easy, because the values we are interested in are already wrapped in the **Function** constructor. These values are created in two places – the function **abstract** that says,

abstractxseenv=Functionhwherehargs=evale(defargsenvxsargs),

and in the function **mkprim** that produces a pair consisting of a name and a primitive function,

mkprimx= (x,Function(primx)).

Function values are used in one place – the function **apply**, defined by

apply(Functionh)args=hargs.

(In general, we might need to analyse the program a bit more carefully to isolate a set of places where higher-order values are created and used. Wrapping the values in a data-type is a good way to make this information explicit.)

### Looking for lambdas

Now we'll get rid of auxiliary functions and partial applications by writing each creation site as a lambda expression. So we write

abstractxseenv=Function(\args->evale(defargsenvxsargs))

and

mkprimx= (x,Function(\args->primxargs)).

The point of introducing lambdas is that it makes absolutely clear what are the free variables. In **abstract**, they are **xs**, **e** and **env**; and in **mkprim** the only free variable is **x**. Other names like *defargs*, **eval** and **prim** are globally defined, so there is no need to capture them as part of the constructed value.

### Creating the constructors

For each creation site, we want a constructor for values that packages the free variables of the lambda expression. So we will delete the alternative Function from the Value data-type, and replace it with

dataValue= ... |Closure[Ident]ExprEnv|PrimitiveString| ...

Then we redefine **abstract** so that it packages the free variables **xs**, **e** and **env** as a **Closure**:

abstractxseenv=Closurexseenv

(so **abstract** has become trivial). And we redefine **mkprim** in a similar way:

mkprimx= (x,Primitivex).

### Writing the worker

The lambda bodies that we deleted from the creation sites are collected together in a new version of **apply**, which uses the saved ingredients to carry out the same computation as before:

apply(Closurexseenv)args=evale(defargsenvxsargs)apply(Primitivex)args=primxargs.

The transformation is now complete.

We have succeeded in replacing the higher-order value type with a first-order type that's much less frightening. And we've also had a good effect on the equation that defines recursion:

elab(Recx(Lambdaxse))env=env'whereenv'=defineenvx(Closurexseenv')

This can now be read as describing a data structure with a cycle in it, rather than some weird kind of recursive function.

Transforming away higher-order functions like this gives us a double boost of confidence in using them in writing definitional interpreters. First, we know that if we want to avoid higher-order functions for foundational reasons, we can transform them away. And second, if we want to relate our abstract interpreters to implementations, d17n gives us a way of introducing concrete data structures.