I was just doing some coding, and I realized a common pattern in functional programming that could be horribly efficient. Consider the simple task of substituting a variable for a value in an expression tree:

`data Expr a b = Immediate a | Var b | Add (Expr a b) (Expr a b)`

```
```

`subst _ e@(Immediate _) = e`

subst (v, e) e2@(Var w)

| v == w = e

| otherwise = e2

subst ve (Add e1 e2) = Add (subst ve e1) (subst ve e2)

The interesting thing here is the last line. Take a very long look at the last line of code and see what it’s doing. It’s creating a new expression out of two new subexpressions. What if those two subexpressions didn’t change at all? It doesn’t matter: it’s still going to throw the old expression on the garbage pile to be collected, and then *create the exact same object*. You’ll note that in the first two cases, I provided the compiler with some micro-optimizations: I made an alias (*e* in the first case; *e2* in the second case), so that it could be hinted to return the original object in the case of idempotence.

Doing this sort of transformation is extremely common in functional languages. What I’ve written there is a **fold** in the general sense of the term. Most functional programmers know folds in the context of lists where, for no sensible reason, we’re supposed to make a distinction between a “left” and “right” fold. Without getting too far into the theory of lambda calculus, data is just constructors. If we define a list as being a nil or a cons, then the list `[]` is just λnc.n, and the list `[x, [y, z]]` is just the function λnc.cx(c(cy(czn))n). Manipulating data then is the process of *replacing that data’s constructor with a new one*—i.e., a fold. For example, `length` in the lambda calculus could be defined as λx.x0(+1). Look familiar? It’s exactly how one might write `length` in a real language!

The idea can be generalized of course. Any data type with constructors—that is to say, any data type—has an implicit fold. Folds provide a nice foundation for thinking about things. Anything which is computable is computable with folds. A fold terminates if each constructor terminates. Here’s `subst` written as a fold in the lambda calculus: λve.λx.λira.xi(λy.(=yv)e(ry))a). It’s somewhat depressing that this is actually more concise in the lambda calculus than in Haskell, but nonetheless it demonstrates the relation between a Haskell function and a fold. As a side note, this also demonstrates the superfluity of general recursion: just about anything useful can be written in the lambda calculus without recursion or a fixed point—yay for guaranteed termination!

Languages like Haskell unfortunately do not offer us implicit folds. We have to spell them out ourselves, like with `subst` above. If we’re very clever and thoughtful, we can write a monad to provide a fold infrastructure (yes, monads are for more than just I/O!), but most people don’t. The point is that whether we recognize it or not, most of the programs we write in functional languages are nothing more than thinly veiled folds.

Anyway, thinking about functional programming in general, you build tiny little higher-order buliding blocks, and glue them together. Most of the time, these building blocks are not actually doing anything. Think back to `subst`: how much is that expression tree *really* going to change after its transformation? The vast majority of the expression tree will stay exactly the same. The rest of the tree is happily generating garbage, and new objects that were the same as the old objects.

Somehow it would be nice to have the compiler pick up idempotence and not generate new objects. If I wanted to give explicit hints to the compiler, I might do this:

`subst ve e@(Add e1 e2)`

| e1' == e1 && e2' == e2 = e

| otherwise = Add e1' e2'

where (e1', e2') = (reduce e1, reduce e2)

This is the intended semantics behind the optimization, but of course it has two problems: it does a—deep and probably slow—equality operation; and it’s explicitly annotated.

There are two possibilities I see:

- if our domain and codomain are the same, hint the garbage collector not to reuse objects. Then we can do surface comparisons—just compare the pointers—when doing a
`==`operation. We have to make`==`“special” to do this probably—currently Haskell treats`==`the same as any other (overloaded) function. - give an explicit return of “nothing changed” from recursive calls.

Both seem doable.

The reason I brought up folds is that the idea can be extended. Right now I’m only talking about idempotence, but for *any* fold, if we have a guarantee that the domain object is unique—i.e., it’s not being aliased by any other reference—then we can get huge hints on where to start clobbering memory. Right now uniqueness is concerned mostly with destructive updates—think updating a memory in place—but I don’t see why the idea couldn’t be extended to more general clobbering.

We can go even further with the idea of uniqueness. Considering `subst` again, think about what happens if one of our branches changes, or even both of them! If we’re unique, we can be clever and do the updates in place, right? We can do even better than that, though. If we’re allowed to do the updates in place, then we can actually tell our calling function, one level up in the recursion stack, that we were idempotent! This means everyone up the stack can treat us as no-ops. Very cool.

February 3, 2007 at 11:56 pm

[…] Posted by mburrell under Research , Implementation , Haskell This is a follow-up from the last entry I made. I decided to actually sit down and reason about this idempotence optimization I dreamed up for […]

February 17, 2007 at 10:38 am

Just remember that whatever overhead your scheme is adding must be really, really small for this to pay off. Constructing new nodes like your original subst did is very cheap in Haskell.

February 17, 2007 at 5:09 pm

Did you basically rediscover hash consing? In an impure language, it is as simple as identity comparison (for the idempotence check) and [weak on keys] hash tables to memoise the return values of constructors. It’s dynamic instead of static, but will catch many more (all ;) cases. I use that pattern in just about all of my language manipulation projects. That won’t get you in-place updates, though, unless you can hack ref counts in.

February 18, 2007 at 7:07 pm

It’s similar to hash consing, I suppose. The major difference is that you know where in your code to look for memoized values, so you don’t have to carry tables around everywhere.