I’ve done more work on the Ca compiler. The to do list now looks as:

  1. garbage collector (finished)
  2. code generation (finished)
  3. lambda lifter (finished)
  4. general code clean-up (in progress; see below)
  5. type inference
  6. parser (finished but weakly tested)

I just finished the parser about half an hour ago. Rudimentary testing shows that it’s working correctly.

As a short side note, since the compiler is written in Haskell, I’m using Happy as the parser generator, a rather typical LALR(1) parser generator, with options for GLR(1) parsing which don’t interest me. The error messages in Happy are both good and bad. They’re bad in that they reference the generated code instead of the code in the grammar file itself. They’re good in that they’re so, well, happy! Consider this snippet from a single error message that was over 200 lines long:

In the second argument of `happySpecReduce_3′, namely `happyReduction_17′
In the expression: happySpecReduce_3 11 happyReduction_17

Maybe it’s just me, but somehow knowing that the problem was with a happyReduction doesn’t make it seem so bad.

Sitting down with a concrete LALR(1) grammar has made me make a few changes to the syntax, as compared to what was published in the paper. For starters, there are a few more semicolons thrown in here and there to mark where one thing starts and another ends. Some syntax issues that I’ve been pondering lately:

  • Type constraints. Since Ca is aimed deliberately at static analysis, I want constraints to go beyond just type constraints. Of course it’s supposed to rely primarily on inference, not explicitly annotated constraints, but constraints are handy. So in Haskell one might write f :: Integer -> Integer to say that the function f is constrained to taking integers as input and producing integers as output. For Ca, we might like, for example, time constraints in addition to type constraints. See the syntax example below for how this might look.
  • Infix functions. For whatever reason, user-defined infix functions, sometimes called operators, are a staple in functional programming, and I like to go with the flow. Why limit yourself to expressions like a + b * c when you can have a +^$$ b &&|| c? Unfortunately this causes a problem with generating C code, as I promised myself I’d limit myself to as little name-mangling as possible. There’s no way around it, though, and for the time being, +^$$ is mangled into the C function OP_plus_caret_dollar_dollar. Infix constructors aren’t in yet but should be some day.
  • Locally scoped data type definitions are in, which I’m quite pleased with. This allows you to write such things as z where z = f A; f = ...; data X = A | B; or some such, where the data type X exists only inside that where clause.

As for the syntax itself, the only “crazy” part that I’ve added is the syntax for constraints. Basically you can add any number of constraints to a function, and each one is tagged with a constraint type. Currently only type constraints are allowed, but time and memory, etc., constraints should hopefully come in the next few years. For example:

    :(type): [Nat] -> Nat;
    :(time): n -> Theta(n);  -- doesn't exist yet
x = x {
    Nil -> 0;
    Cons x xs -> x + @xs;

For the time being, :: on its own defaults to :(type):. I don’t know if I’m 100% satisfied with that syntax, but it seems workable.

As for the code cleanup, if you read the last post I made, you’ll know I’m trying to make this code as literate as possible. I’ve been in touch with Joachim Schrod, the author of CWEB-latex, as I’ve decided LaTeX is the best way to go, what with its nice hierarchical document structure and whatnot. CWEB-latex hasn’t been maintained in 13 years and neither I nor Joachim know of anyone using it, but nonetheless he sent me an updated class file which will hopefully work for me. The 1995 release of CWEB-latex almost works with modern LaTeX distributions, but not quite.

The Ca compiler I mentioned one week ago, which I’ve dubbed cac, is progressing well. The core of the back-end is effectively done. Here’s the current to do list:

  1. garbage collector (finished)
  2. code generation (finished)
  3. lambda lifter
  4. general code clean-up, as it will no doubt be past due at this point
  5. type inference
  6. parser

So far I’m fairly pleased with my progress, though that always seems to happen when I get into coding. It makes me half-wonder why I went into research when coding is always so much more productive, ha!

The compiler is written in Haskell and targets C. The garbage collector is all written in C. Nothing about the interface to the Ca run-time/garbage collector is unportable, but currently the primary data structure is dependent on an LP32—i.e., longs and pointers are 32 bits in size—architecture. I will probably provide an implementation for LP64 systems as well, as well as a totally portable implementation, which makes no assumptions violating the C99 standard. As it’s only one struct which needs to be rewritten, it won’t take much time. The current implementation, for LP32 systems, allows arbitrary-sized boxed values as well as 31-bit unboxed values. Currently 31-bit natural numbers, i.e., unsigned longs, are the only unboxed values supported.

The rest of the garbage collector is portable. I even used malloc() to set up the heaps. Since my memory pools are of fixed size right now, there’s no downside to using malloc(), though I might have to abandon it if I ever go to dynamically-sized pools.

I’m probably most proud of my code generation. Since Ca is not a general purpose language, interfacing with other languages is in the back of my mind. And, of course, when I say “other languages”, I implicitly mean “C”, since every language’s foreign function interface is C if nothing else.

So, if, in Ca, you define a function called foo, the generated code will create a C function of exactly the same name foo. This means that if you’re calling Ca code from C, you don’t have to worry about name mangling, or really importing function prototypes at all.

Further, all unboxed return types and parameter types in the generated code have type unsigned long. For example, if you write the Ca function foo x y = x { Zero -> y; Succ p -> @p * y; }, where x, y and the return value are all natural numbers, the prototype for calling this function from C is exactly unsigned long foo(unsigned long, unsigned long). Indeed function applications in Ca are nothing but C function calls. Quite a nice property, I think.

For boxed values, of course, things get a little bit more complicated when doing a foreign function interface. The Ca run-time will most definitely have to provide service functions to convert, for example, arrays into boxed lists. I don’t think there’s any way around this.

The C code generated is moderately readable, in the sense that whitespace is all proper and some variable names are meaningful. E.g., if you have a Ca variable blarg, this will show up as var_blarg in the generated C code. Hopefully this will make debugging easier in long run. Unfortunately the generated C code uses a lot of temporary variables, and the comments it produces are rather sparse.

Catamorphisms over natural numbers are implemented using for loops. Catamorphisms over other data types are implemented using recursion.

I’d almost forgotten how much fun it is to implement a compiler. While I was tagging the germane categories for this post, I almost instinctively checked off “hobby”. Good times. I think I’ll let it rest for a couple days and focus on other research. Though I won’t feel completely accomplished until I get my computer and hooked up to the external hard drive for back-up, ha!

I suppose it’s further proof that Haskell has corrupted me forever. For my hobby projects, I’m typically doing Objective C Cocoa these days. Usually when I find myself working in non-Haskell languages, I get frustrated at the lack of laziness. Laziness is a wonderful thing. Treating a computation as an infinite data structure just makes your life a lot easier in a number of ways, most notably being able to reuse components that were designed to work on finite data structures. From hereon, every time I say “data structure”, pretend I mean “collection” or whatever they’re called these days: abstract data types that are meant to hold a bunch of things.

The more I work, though, the more I’ve noticed that I’ve fallen into a strange design pattern: I practically never pass around actual data structures (unless they’re mutable and I’m adding something to them, for example); I practically always pass around an enumerator—NSEnumerator in Cocoa.

It’s really quite wonderful. The most immediate properties of this are:

  1. I don’t have to waste time wondering if I’m operating on an NSSet or NSArray or whatever, or if they have a common superclass;
  2. I’d have to create an NSEnumerator eventually anyway to do anything useful, so I’m not adding any new overhead;
  3. the big one, if you haven’t guessed already: it’s trivial to introduce infinite data structures, i.e., data structures which aren’t “data” at all, but rather code.

I whipped up what I call CallbackEnumerator and its friend, the Enumeratable protocol (protocols are what Java calls interfaces. The world of OO wouldn’t be complete without 3 contradictory nomenclatures). The Enumeratable protocol contains just the -enumerateNext message (and variants to allow for arguments), basically as a means to pretend that a certain object is actually a closure. The CallbackEnumerator class does nothing but transform NSEnumerator messages into Enumeratable messages.

So far it hasn’t been terribly useful, as I’m only using it in one spot in my program, in my tokenizer, the idea being that you create a tokenizer with a string and, as if by magic, it instantly gives you an NSEnumerator with all the tokens. Arguably I could have just made a special TokenizerEnumerator and skipped the whole CallbackEnumerator nonsense, but I like the potential for reuse.

Where the enumerator fails is providing structure. The enumerator does a wonderful job at simulating a lazy list. But what about the lazy tree? It seems rather pointless to create an enumerator-a-like for each abstract data type.

In general, I think the role of the enumerator in object-oriented languages like Objective C is backwards. What would make more sense is, for each abstract data type, have an “enumerated” subtype. For example, for NSArray, there could be an NSEnumeratedArray, such that NSEnumeratedArray is a subclass of NSArray. NSEnumeratedArray operates exactly like NSArray, and has the same methods, but rather than containing a giant chunk of memory from which to return elements, it executes code. Similarly for NSDictionary or any other abstract data type.

Consider then the -objectEnumerator method that NSSet implements. What this is really saying to me, semantically, is a transformation of the elements of NSSet, so that they are no longer represented as a set, but rather are represented as a lazy list/array. So, replace that method with -enumeratedArray and you have the same thing. I should mention that, in Objective C, there is a sharp divide between immutable objects and mutable objects. NSArray is necessarily immutable, and thus it’s easy to say that an NSEnumeratableArray would respond to the same messages as an NSArray. As to what an NSMutableEnumeratableArray would look like, that’s another matter.

The benefit is that you are no longer fudging two ideas together. An enumerator is two things: it is a way of representing laziness; and it is a flat, ordered view of elements. There can be a separation between the two. So you can have your NSEnumeratedArray, which is exactly as an NSEnumerator is now: it provides you a way to get the elements out of a list, one at a time, sequentially. Or you could have an NSEnumeratedDictionary or NSEnumeratedTree or what have you.

I suppose it’s somewhat telling that I had to invent types to prove the point. Unfortunately, AppKit—the “foundations” part of Cocoa that provides abstract data types—doesn’t offer much beyond arrays, sets, and dictionaries, so maybe there isn’t much of a need. Still, it’s a bit telling that while all of these classes have initializers like -initWithObjects: or -initWithSet:, there is no -initWithEnumerator: to be found. The concept of chaining lazy data types together doesn’t really exist in the API, which is unfortunate.

I stumbled across this presentation today. It was presented by Simon Peyton-Jones and talks about a new way of doing parallelism under Haskell. Much cooler than that STM nonsense.

The presentation explains things much better than I could, but in contract to STM, basically it’s moving the parallelism annotations away from the code and into the data types. So now we have the concept of a “parallel array” of floats. The key to the coolness is that parallelism can be nested, as is shown quite well in his parallel quicksort example. How does it work? Easy as pie is how it works!

Actually its operation is pretty straightforward, as long as you’re not the one implementing it. Nested parallelisms conceptually get “flattened” into one level of parallelisms. After that it’s all book-keeping to know when one operating should be dispatched to some other thread.

The other key to making it fast is that composed functions—and after the parallel code gets transformed, there are a lot of composed functions—get “fused” together. “Fusion” seems to be the new buzzword in Haskell. I swear, it must have shown up in just about every ICFP paper in the past few years. Here’s an example of fusion advocacy. Fusion basically seems to be the new wave of deforestation, which used to be the new wave of listlessness. In any case, it’s a fancy program transformation that allows you to specialize/inline one function according to another, thus eliminating intermediate products. It’s cool and only works when you don’t have side-effects.

In contrast to STM, I have to say, Nested Data Parallelism does look vaguely practical, even with a straightforward implementation. There’s even the obligatory “grain of salt” benchmark slide about a third of the way through demonstrating how awesome it is. In one of his first slides, too, he makes reference to running Haskell code on a GPU. I wonder what he actually said about that in his talk.

Anyway, if you skip down to the last slide, you see the typical Simon Peyton-Jones conclusion. Basically “there are programming problems that are too complex for a programmer to deal with himself. But there’s a solution! But this solution only works with pure functional languages! That’s why Haskell is the future! All we have left to do is implement it efficiently”. He may have his own mini-Jobs reality distortion field, but I actually believe him.

This blog post gives a cool usage of laziness. It’s a framework for doing automatic differentiation in Haskell. It’s cool, though, because it lazily produces infinite derivatives.

Automatic differentiation is an alternative to symbolic differentiation and numeric differentiation. Basically a numeric function, even if it can’t easily be reduced to a single expression, can automagically generate derivatives so long as it is built up of some arithmetic primitives. One has to make liberal usage of overloading in Haskell, though, so that functions think that they’re working over normal numbers.

Googling around for memoization and Haskell brings up many posts about how easy it is in theory, with some example code on how to do it in a specific instance. However, I couldn’t find a really general solution to the problem.

Template Haskell to the rescue. Today I coded up a macro to do this in a fairly general way. This way you can write code in the most natural way possible:

$(memoize [d|   fib 0 = 1
                fib 1 = 1
                fib n = fib (n - 2) + fib (n - 1) |])

And the macro transforms it into memoizing code that will store previous results in a linked list of arrays. The linked list was required because arrays in Haskell are of finite size, whereas I wanted arrays of potentially infinite size.

The good news is that it works very well. Without memoization, evaluating a recursive fibonacci really strains once you hit an n of about 30. With memoization, my 1.6GHz G5 can do an n of 20000 in one second—for comparison, a natural iterative C version will do about 20000000 in one second, a thousand times as much, though to be fair the C version is not using bignums. Also, the code is general enough that it works over any parameters which are of the Enum and Ix classes—basically any data type which provides a function to map it to ranges of integers. Mutually recursive memoized functions work fine: once a function has been memoized, it behaves and can be called exactly like it was in its original representation.

The bad news is it’s a huge memory leak. The memoizing arrays are always reachable, so they never get garbage collected. In the case of fibonacci, once I tried an n of about 50000 or so on my machine it was game over: memory usage was such that it was hitting swap, and there was no hope of any memory ever being reclaimed. The other bad news is that I’ve only written it to memoize the first parameter. This limits its utility in many dynamic programming problems which operate over 2 parameters.

Once I clean it up a bit I think I’ll post it on the old webpage. I’ll add multiple parameter support, and maybe flip the structure of the memoizing array list so the most recently calculated values are closest in memory: hopefully once things go to swap they stay there, even if they never get reclaimed.

There may be some hope for allowing values to be forgotten about and reclaimed by the garbage collector. I’ll have to do some reading to see what other people have done here. Unfortunately a purely functional language like Haskell relies heavily on not forgetting things so that it can reuse effort—convincing it’s semantics that you want to forget about something and then recalculate it later is ironically quite difficult in this case.

Overall it’s an okay tool, I think. The state of things right now is that, in the world of Haskell, dynamic programming Sucks. This should help things out.

Why am I just finding out about Hoogle now? It’s a Google interface to Haskell’s APIs. It’s brilliant. So many times I’ve thought “okay I have an [a] and an a -> IO b, now what function makes an IO [b]?”. Especially with all the list functions, all the scans and iterates and interleaves and folds and maps tend to run together a bit. It’s especially weird since 99% of these functions can be reimplemented without much thought in 2 lines of code, but it’s some stubborn form of Haskell pride to never reimplement functionality.

Seriously, though, I think a lot of languages and libraries could benefit from a Google interface. None of this “man” or “javadocs” nonsense.

Even though Geoff tipped me off to it weeks ago, I just now got around to reading about it. It is Software Transactional Memory, and it is Haskell‘s way of dealing with shared memory in concurrent applications.

The motivation is right on the mark: shared memory currently is typically done through locks and condition variables. These are error-prone for a whole host of reasons, as everyone knows already: you might not lock enough locks; you might lock too many locks; you might lock locks in the wrong order; the list goes on. None of these problems will the compiler help you with and, sadly, even testing will rarely catch many of these errors. The bug might only arise years after the software has been released.

Simon Peyton-Jones’ solution is Software Transactional Memory (STM hereafter). It should not be a surprise, as it comes from Peyton-Jones, that STM is:

  1. extremely elegant;
  2. offers guaranteed semantics by the static typing system;
  3. uses monads to compose primitives together; and
  4. offers little in the way of an intuitive efficient implementation.

It’s unfortunate that point #4 so often is a consequence of points #1 through #3. But, I suppose my research is meant to address that. I’ve said it in other points, but it bears repeating: monads are awesome. Still today a lot of people have the impression that monads are only useful for I/O. This is really not true—in my own Haskell coding, I/O work is in the minority of work I do with monads. In a hand-waving way, monads provide a good way for describing how functions should be composed together. They’re an excellent structure for all sorts of things—and I’m convinced shared memory is one of those things—in enforcing some sort of semantics.

In the case of STM, the semantics we’re enforcing are what Peyton-Jones describes as atomicity and isolation. The first is self-explanatory: STM “actions” are atomic. The second is just supposed to mean that the actions of one thread don’t affect the actions of another. Again, it stands repetition: these semantics don’t follow from ideal programming practices or anything like that; they are strictly enforced by the typing system. It is not possible to not have these properties.

We have a couple of primitive STM actions: reading a “transactional variable” (shared data structure); writing to a transactional variable; creating a new transactional variable. We have some fancy functions to compose actions together as well, beyond those offered by the monad: we can block on some condition, and we can have an alternative action in case the first one is blocking. Composing many STM actions together gives you a bigger STM action.

Finally, we have a function to bring an action from the STM monad into the IO monad. Or, stated another way, we have a function to execute an STM action.

This is the composition model, and it provides the four properties I mentioned earlier. One property I didn’t mention is that this strictly prevents transactional variables from being accessed from outside an STM action. Hooray.

This is actually extremely cool. It gives a high-level view of shared memory which is good, and has some nice guaranteed semantics. The problem is with in the implementation. There are two naïve implementations of this:

  1. every time we execute an STM action, lock some big giant global lock. When we’re done, unlock that big giant global lock. This is slow and stupid and prevents any sort of interesting concurrency; or
  2. don’t lock until a transaction is complete. When executing an STM action, write its actions to a transaction log. When you’re done the log, lock, ensure the memory you were using was still consistent, commit the log, and unlock. The problem is that if the memory isn’t consistent, you have to re-execute the action. Not good.

What we’re missing is a way to express the fact that one STM action has no possible effect on another STM action. If we had that information, we could use smaller locks.

Overall the STM model is a good one. It takes a philosophy that I like: allow the programmer to express what’s going on at a very high level, and let the compiler worry about making sense of it. But it’s going to take a whole lot of inference for the compiler to separate these STM actions to avoid making global locks. Transaction variables can be passed around as parameters, and one transaction variable can contain another transaction variable—keep in mind these are data structures just like any other data structures—and except in the most obvious cases, it becomes too easy for the inference code to throw up its hands and say “you know what, this is just turning into a complete graph. I’m just going to have to use one lock for everything”. Maybe there is room for doing this stuff partially at run-time, via tagging, but that’s messy as well.

There actually may be some research potential in this for me down the road. STM is new and on its baby legs right now. Sorting out a good implementation for STM would have consequences for other issues in functional programming, predominantly with aliasing. And who knows, maybe one day extremely high-level languages could find a place in high-performance computing.

I’ve mentioned before my CS888 project, which is taking GHC‘s garbage collector and adding real-time hooks into it. There will be two functions exported on the Haskell level: Control.RealTime.gcNextCollectionUpperBound :: IO Int and Control.RealTime.gcBound :: Int -> IO a -> IO a. The latter function is “where the magic happens” (after Mark pointed out that I used this phrase 3 times in my comments in my CS833 project source, I realize I use it all the time). Indeed the former function isn’t even necessary, but is useful for making garbage collection more predictable. But the latter function is used to force the garbage collector to limit itself during some chunk of code and not go off on some tangent collecting, behaviour which would be disastrous in real-time code. Note that gcBounds can be nested.

I bolded the word “force” up there, because I want to stay hard real-time if at all possible. If the user wants soft and I provide hard, that’s just a bonus. If the user wants hard and I proved soft, that’s no good.

The more I think about gcNextCollectionUpperBound, the tricker it gets. In case its name is not clear, its job is to estimate how long it thinks the next round of garbage collection will take, in milliseconds. This function should be fairly quick—it shouldn’t simulate a collection just to figure out how much garbage there is; it should be moderate accurate—hopefully within an order of magnitude in most cases; and it should return a hard upper bound, such that it is impossible for garbage collection to take more than the number returned.

So, consider the worst case scenario for garbage collection. The worst case is that garbage collection will have to swap in, from disk, the objects that it’s collecting. In the worst case, pages will be scattered all throughout, such that it will have to do a swap for every page. Thus, if we assume an 8ms seek time on the hard drive, the hard upper bound is 8p, where p is the number of pages to be collected. Then you think: maybe even assuming an 8ms seek time isn’t being hard enough; maybe the user is running this real-time system with a network-mounted swap partition, and the swap file is actually on Mars. Then the hard upper bound is something like 2400000p. Suddenly it seems obvious that getting a bound within an order of magnitude—one of the properties I wanted for gcNextCollectionUpperBound is looking hopeless.

Probably I will have to abandon my dream of returning a hard upper bound. This isn’t the end of the world: as long as gcBound provides a hard guarantee, that’s all that matters in the end. Probably what I will do to get a soft upper bound is:

  1. set up some compile-time latency constants: something like 8ms for swap, 1μs for RAM, 1ns for cache;
  2. come up with some arbitrary hand-wavish heuristics for semi-conservatively guessing when a page is going to come up from one of those 3 locations;
  3. look at total memory usage and start multiplying page numbers by constants.

It’s point #2 that’s the hard one, and I’m sceptical as to how much I’ll be able to glean about the structure of the heap without actually diving into it.

This is sadly the only research I’ve found on the subject.

I just played around a bit with Parsec, which is a parser which comes with all major Haskell98 implementations. Usually if you’re doing parsing in Haskell you’re using Happy, which is a fairly typical LALR(1) functional parser generator.

Parsec is a top-down parser, much like a recursive descent parser. Parsec provides you a monad in which to write pretty grammars; it then returns to you either a parse value—probably a tree—or an error message. I have to admit, the error messages it gives you by default, if you can’t parse, are stunning. Typically error messages are the hardest, and least often done, tasks in parsing, but Parsec’s defaults are fairly clever.

Anyway, combinator parsers like Parsec, like LL(k) parser generators, are predictive. I.e., you should know what rule you’re using to parse by some finite, constant look-ahead. In LL(k) parsing, this “should” is a strict requirement, and is what gives LR(k) its supremacy over LL(k) in terms of languages recognized; in Parsec, it’s a performance issue. Parsec handles back-tracking for you, but for performance reasons, the manual suggests you factor out your grammar yourself.

Example parenthesis grammar in an LALR(1) or LR(k) system:
a := '(' a ')'
  | ε

Example parenthesis grammar in Parsec:
parens =
    char '('
    char ')'

I’ve left out attached semantics, but they don’t ugly things up much more. The very cool thing about combinator parsers is the integration with the main language. Terminals, nonterminals and rules are nothing but Haskell functions and data. Thus, if you have some grammar design pattern, such as infix operator associativity, or lists of nonterminals with some separators, all you have to do is make a new combinator.

I should say that another cool thing about Parsec is that it’s not limited to a subset of context-free grammars, or even limited to context-free grammars at all. It’s not too much trouble to write Parsec grammars which accept non-context-free, but context-sensitive, languages.

The downside, which fits into a previous rant of mine, is that you’re shifting parsing from compile-time to run-time. Parser generator manuals like to give toy examples and show the expressiveness and how wonderful things are, but the fact is that writing correct grammars is hard. Maybe I’m just especially dumb with grammars, but I like to have all the help I can get. I like it when yacc tells me I have reduce/reduce conflicts or what have you: it tells me that I haven’t written my grammar correctly yet.

Also, its expressiveness and integration with Haskell, allowing you to do write non-context-free languages and crazy things like that, doesn’t strike me as a big win. Any language that you’d like to express in a computer science context is certainly going to be context-free—probably much more restrictive than that—and I like the sharp division we currently have between syntax and semantics. The simple, bitchy, LALR(1) parser generator deals with the syntax tree. Once it hands you the tree, you can forget all about syntax and start looking at semantics. I can’t come up with a parsing situation in which I’d want anything else.

Update: I did actually come up with a situation in which I’d want something else. There’s a rather famous language that I can’t recall (ALGOL?) which had no keywords. It allowed you to write ridiculously stupid things like IF IF = THEN THEN THEN = ELSE—e.g., IF, THEN and ELSE are regular identifiers, as well as constructs, depending on context, hence the “no keywords” idea. Does anyone recall which language this was, and why such a brilliant decision was made to not have keywords? In any case it’s a pain to parse: the usual trick is for the grammar to “know” when it’s looking for a keyword, and then flag the lexer; the lexer will then stop parsing identifiers and start parsing keywords. It’s an even bigger pain if you want to make your parser re-entrant, and a crazy combinator parser like Parsec might just make your life a lot easier then.

Next Page »


Get every new post delivered to your Inbox.