I’ve done more work on the Ca compiler. The to do list now looks as:

  1. garbage collector (finished)
  2. code generation (finished)
  3. lambda lifter (finished)
  4. general code clean-up (in progress; see below)
  5. type inference
  6. parser (finished but weakly tested)

I just finished the parser about half an hour ago. Rudimentary testing shows that it’s working correctly.

As a short side note, since the compiler is written in Haskell, I’m using Happy as the parser generator, a rather typical LALR(1) parser generator, with options for GLR(1) parsing which don’t interest me. The error messages in Happy are both good and bad. They’re bad in that they reference the generated code instead of the code in the grammar file itself. They’re good in that they’re so, well, happy! Consider this snippet from a single error message that was over 200 lines long:

In the second argument of `happySpecReduce_3′, namely `happyReduction_17′
In the expression: happySpecReduce_3 11 happyReduction_17

Maybe it’s just me, but somehow knowing that the problem was with a happyReduction doesn’t make it seem so bad.

Sitting down with a concrete LALR(1) grammar has made me make a few changes to the syntax, as compared to what was published in the paper. For starters, there are a few more semicolons thrown in here and there to mark where one thing starts and another ends. Some syntax issues that I’ve been pondering lately:

  • Type constraints. Since Ca is aimed deliberately at static analysis, I want constraints to go beyond just type constraints. Of course it’s supposed to rely primarily on inference, not explicitly annotated constraints, but constraints are handy. So in Haskell one might write f :: Integer -> Integer to say that the function f is constrained to taking integers as input and producing integers as output. For Ca, we might like, for example, time constraints in addition to type constraints. See the syntax example below for how this might look.
  • Infix functions. For whatever reason, user-defined infix functions, sometimes called operators, are a staple in functional programming, and I like to go with the flow. Why limit yourself to expressions like a + b * c when you can have a +^$$ b &&|| c? Unfortunately this causes a problem with generating C code, as I promised myself I’d limit myself to as little name-mangling as possible. There’s no way around it, though, and for the time being, +^$$ is mangled into the C function OP_plus_caret_dollar_dollar. Infix constructors aren’t in yet but should be some day.
  • Locally scoped data type definitions are in, which I’m quite pleased with. This allows you to write such things as z where z = f A; f = ...; data X = A | B; or some such, where the data type X exists only inside that where clause.

As for the syntax itself, the only “crazy” part that I’ve added is the syntax for constraints. Basically you can add any number of constraints to a function, and each one is tagged with a constraint type. Currently only type constraints are allowed, but time and memory, etc., constraints should hopefully come in the next few years. For example:

sumList
    :(type): [Nat] -> Nat;
    :(time): n -> Theta(n);  -- doesn't exist yet
x = x {
    Nil -> 0;
    Cons x xs -> x + @xs;
};

For the time being, :: on its own defaults to :(type):. I don’t know if I’m 100% satisfied with that syntax, but it seems workable.

As for the code cleanup, if you read the last post I made, you’ll know I’m trying to make this code as literate as possible. I’ve been in touch with Joachim Schrod, the author of CWEB-latex, as I’ve decided LaTeX is the best way to go, what with its nice hierarchical document structure and whatnot. CWEB-latex hasn’t been maintained in 13 years and neither I nor Joachim know of anyone using it, but nonetheless he sent me an updated class file which will hopefully work for me. The 1995 release of CWEB-latex almost works with modern LaTeX distributions, but not quite.

Advertisements