The Ca compiler I mentioned one week ago, which I’ve dubbed cac, is progressing well. The core of the back-end is effectively done. Here’s the current to do list:

  1. garbage collector (finished)
  2. code generation (finished)
  3. lambda lifter
  4. general code clean-up, as it will no doubt be past due at this point
  5. type inference
  6. parser

So far I’m fairly pleased with my progress, though that always seems to happen when I get into coding. It makes me half-wonder why I went into research when coding is always so much more productive, ha!

The compiler is written in Haskell and targets C. The garbage collector is all written in C. Nothing about the interface to the Ca run-time/garbage collector is unportable, but currently the primary data structure is dependent on an LP32—i.e., longs and pointers are 32 bits in size—architecture. I will probably provide an implementation for LP64 systems as well, as well as a totally portable implementation, which makes no assumptions violating the C99 standard. As it’s only one struct which needs to be rewritten, it won’t take much time. The current implementation, for LP32 systems, allows arbitrary-sized boxed values as well as 31-bit unboxed values. Currently 31-bit natural numbers, i.e., unsigned longs, are the only unboxed values supported.

The rest of the garbage collector is portable. I even used malloc() to set up the heaps. Since my memory pools are of fixed size right now, there’s no downside to using malloc(), though I might have to abandon it if I ever go to dynamically-sized pools.

I’m probably most proud of my code generation. Since Ca is not a general purpose language, interfacing with other languages is in the back of my mind. And, of course, when I say “other languages”, I implicitly mean “C”, since every language’s foreign function interface is C if nothing else.

So, if, in Ca, you define a function called foo, the generated code will create a C function of exactly the same name foo. This means that if you’re calling Ca code from C, you don’t have to worry about name mangling, or really importing function prototypes at all.

Further, all unboxed return types and parameter types in the generated code have type unsigned long. For example, if you write the Ca function foo x y = x { Zero -> y; Succ p -> @p * y; }, where x, y and the return value are all natural numbers, the prototype for calling this function from C is exactly unsigned long foo(unsigned long, unsigned long). Indeed function applications in Ca are nothing but C function calls. Quite a nice property, I think.

For boxed values, of course, things get a little bit more complicated when doing a foreign function interface. The Ca run-time will most definitely have to provide service functions to convert, for example, arrays into boxed lists. I don’t think there’s any way around this.

The C code generated is moderately readable, in the sense that whitespace is all proper and some variable names are meaningful. E.g., if you have a Ca variable blarg, this will show up as var_blarg in the generated C code. Hopefully this will make debugging easier in long run. Unfortunately the generated C code uses a lot of temporary variables, and the comments it produces are rather sparse.

Catamorphisms over natural numbers are implemented using for loops. Catamorphisms over other data types are implemented using recursion.

I’d almost forgotten how much fun it is to implement a compiler. While I was tagging the germane categories for this post, I almost instinctively checked off “hobby”. Good times. I think I’ll let it rest for a couple days and focus on other research. Though I won’t feel completely accomplished until I get my computer and hooked up to the external hard drive for back-up, ha!

Advertisements