This is a follow-up from the last entry I made. I decided to actually sit down and reason about this idempotence optimization I dreamed up for Haskell. So, building on the same example, here’s the new “optimized” version, which doesn’t recreate objects if they’re the same:
data Expr a b = Immediate a | Var b | Add (Expr a b) (Expr a b)

subst _ e@(Immediate _) = (True, e)
subst (v, e) e1@(Var w)
    | v == w = (False, e)
    | otherwise = (True, e1)
subst ve e@(Add e1 e2)
    | v1 && v2 = (True, e)
    | otherwise = (False, Add e1' e2')
    (v1, e1') = subst ve e1
    (v2, e2') = subst ve e2

The boolean we’re passing along carries the semantics of “is idempotent”. The “unoptimized” version is exactly the same except that in the third (“Add“) case of subst, we don’t have any guards, or any checks for idempotence.

I generated some random data to play with, profiled the “fast” and “slow” code, and the results were actually quite dramatic. This was running on my 1.6GHz G5, compiled with GHC 6.4 with -prof -auto-all -O. The results are given in the number of seconds it took to do a substitution transformation on the expression tree, where roughly 5% of the tree’s nodes were variables to be substituted. Consequently, shorter bars are better.

Plotting data size versus execution time, demonstrating the efficacy of a hypothetical idempotence optimization.

I should say the “depth” in this case is the maximum depth of the expression tree. The tree isn’t particularly complete or symmetrical in any meaningful way.

The number of memory allocations was similarly reduced. It seems to be something worth investigating, anyway. Maybe I’m overselling how often this sort of optimization would come up.