The preliminary compiler for Ca is nearing usability right now. The back-end—i.e., the garbage collector, code generator and lambda lifter—should be stable now, though I have recently had to make minor corrections to all three of those yesterday. The parser also should be stable. What’s in the middle is always the most unholy component of any compiler I write, the typing system.

I don’t think it’s because I’m particularly stupid; I think typing is just difficult. Even in type unification, which is one of the simplest and most concise of the typing components. I spent a good hour yesterday tracking down a type unification bug. And then there is type inference. Save for one bug, to be described below, I think type inference is finally working solidly and reliably. But it will require a massive clean-up before it’s can be called “literate.”

Hopefully tonight I will have a web interface up such that people can play around with Ca code to see how catamorphisms actually look in C. In the mean time, I have but one serious bug left to squish in type inference.

Writing the compiler was incredibly useful, and of course necessary in the long run, as the language is much more concrete in my mind, rather than writing the theory and in my mind thinking “well that will all work out.” There are, however, two issues that need addressing:

  1. Ca needs higher-order values and lambda abstraction at some point. I just don’t think a functional language is viable without them. I’m also quite confident that higher-order values can be restricted in a way to avoid “the Loïc Colson” problem where it necessarily has to blow up the computational power of the language beyond primitive recursion.
  2. I haven’t proven it, but it occurred to me that likely it is impossible to write a Ca program which runs in Θ(lg n) time, or in fact in any complexity that involves logs. The problem stems from the fact that in a divide-and-conquer approach, we recurse on both halves immediately, eliminating any advantage of the “divide.” For example, consider a natural binary search in Ca:
    data SearchTree k v
      = Leaf
      | Node (SearchTree k v) k v (SearchTree k v);
    find x t = t {
      Leaf -> Nothing;  -- failure
      Node left key value right -> if key == x then
          Just value  -- found it!
        else if key < x then
          @left
        else
          @right;
    };

    It looks reasonable: @left and @right stand for recursing on the left and right branches, respectively, so it seems like this is exactly what we want to do. The problem is that, currently, @left and @right are computed up front: before the if expression ever gets a chance to be evaluated! This means both the left and right branches are recursed on in all cases. So, we visit the entire tree.

    This has two immediate solutions. The first is to introduce laziness. General laziness is a pain and so I don’t want this. I will talk about this in the next paragraph, however. The second solution is to compute @whatever only when it is used and then rely on people using let expressions to avoid duplication of work. This works, but I’m not entirely convinced at this point that this is the route I will go down.

    A common sub-expression (CSE) removal optimization would, in my mind, be the ideal solution. In other words, the programmer is allowed to use @whatever variables wherever he or she likes, and the compiler statically pushes off computing the variable as long as possible and then wrapping it in a let.

    At first blush, this probably seems needlessly complicated the semantics of the language. However, if the goal of the language is to statically prove time complexity, etc., then it seems that the work it would have to do to prove complexity is the same as the work it would have to do to do the CSE removal I described above. But at the least, as a stop-gap, I should probably change the semantics such that the computation of @whatever is put off until the variable is used, even if the absence of a CSE removal optimization.

    Of course all of this is ignoring the fact that for Ca to prove that binary search is in logarithmic time, it would have to prove that the tree is balanced, definitely no small feat!

Advertisements