Garbage collection

We got a paper submitted to PLDI—Programming Language Design and Implementation—if just. The acceptance rate of late has been around 20% it seems. I think the paper was good, but I have no idea of it was that good. Since PLDI has a double-blind refereeing system, I’ll try not to give too many specifics in this post: specific enough that I can talk about things, but vague enough that a referee won’t inadvertently stumble here from Google.

I’m fairly sure that this language marks the direction my research should be going in during my Ph.D. It has the possibility of being a very nice language. Of course I’ll wait to see what the referees say. It may be unlikely, but there’s always the chance of a “this research is a dead-end because of [reason you’ve never thought of]” which is why it’s nice to get feedback from fresh eyes.

There are issue for the front-end. I think this is the biggest fault of the language as it stands right now. Considering what the language is intended to do—give automatic guarantees on polynomial time complexity—a lot of the complexity is justifiable, but still it can be improved. The typing system is a bore to get around. After becoming familiar with the language for a few weeks or so one starts to get a bit of intuition about how to structure things to “beat” the typing system. Still, it would be nice to add a bit more in the way of inference so the typing system can help you out. The syntax is perhaps more cumbersome, in the sense that there are too many constructs, than is totally necessary.

Changes to the front-end wouldn’t be purely cosmetic, either. The way we have the language right now is nice since everything that’s in there serves a purpose and everything is proved to work correctly. In a language such as this, removing something or changing something is a very delicate matter and it’s easy to destroy the guarantee mentioned above of guaranteeing time complexity. But work and cleverness aside, I’m sure it can be done.

On the back end of things, there aren’t problems so much as there are opportunities. As it stands, the implementation—which needs a lot of work yet—is just a very naïve interpreter. Making a compiler for a language such as this brings up myriad issues which one wouldn’t find in any other language. Memory management is the big one. Garbage collection is likely not necessary at all, so it will be interesting to see exactly what nice—and efficient—memory management schemes it would offer.


The Ca compiler I mentioned one week ago, which I’ve dubbed cac, is progressing well. The core of the back-end is effectively done. Here’s the current to do list:

  1. garbage collector (finished)
  2. code generation (finished)
  3. lambda lifter
  4. general code clean-up, as it will no doubt be past due at this point
  5. type inference
  6. parser

So far I’m fairly pleased with my progress, though that always seems to happen when I get into coding. It makes me half-wonder why I went into research when coding is always so much more productive, ha!

The compiler is written in Haskell and targets C. The garbage collector is all written in C. Nothing about the interface to the Ca run-time/garbage collector is unportable, but currently the primary data structure is dependent on an LP32—i.e., longs and pointers are 32 bits in size—architecture. I will probably provide an implementation for LP64 systems as well, as well as a totally portable implementation, which makes no assumptions violating the C99 standard. As it’s only one struct which needs to be rewritten, it won’t take much time. The current implementation, for LP32 systems, allows arbitrary-sized boxed values as well as 31-bit unboxed values. Currently 31-bit natural numbers, i.e., unsigned longs, are the only unboxed values supported.

The rest of the garbage collector is portable. I even used malloc() to set up the heaps. Since my memory pools are of fixed size right now, there’s no downside to using malloc(), though I might have to abandon it if I ever go to dynamically-sized pools.

I’m probably most proud of my code generation. Since Ca is not a general purpose language, interfacing with other languages is in the back of my mind. And, of course, when I say “other languages”, I implicitly mean “C”, since every language’s foreign function interface is C if nothing else.

So, if, in Ca, you define a function called foo, the generated code will create a C function of exactly the same name foo. This means that if you’re calling Ca code from C, you don’t have to worry about name mangling, or really importing function prototypes at all.

Further, all unboxed return types and parameter types in the generated code have type unsigned long. For example, if you write the Ca function foo x y = x { Zero -> y; Succ p -> @p * y; }, where x, y and the return value are all natural numbers, the prototype for calling this function from C is exactly unsigned long foo(unsigned long, unsigned long). Indeed function applications in Ca are nothing but C function calls. Quite a nice property, I think.

For boxed values, of course, things get a little bit more complicated when doing a foreign function interface. The Ca run-time will most definitely have to provide service functions to convert, for example, arrays into boxed lists. I don’t think there’s any way around this.

The C code generated is moderately readable, in the sense that whitespace is all proper and some variable names are meaningful. E.g., if you have a Ca variable blarg, this will show up as var_blarg in the generated C code. Hopefully this will make debugging easier in long run. Unfortunately the generated C code uses a lot of temporary variables, and the comments it produces are rather sparse.

Catamorphisms over natural numbers are implemented using for loops. Catamorphisms over other data types are implemented using recursion.

I’d almost forgotten how much fun it is to implement a compiler. While I was tagging the germane categories for this post, I almost instinctively checked off “hobby”. Good times. I think I’ll let it rest for a couple days and focus on other research. Though I won’t feel completely accomplished until I get my computer and hooked up to the external hard drive for back-up, ha!

In the fall-out of SOFSEM 2008, I have a bit of a to-do list to take care of. There’s the usual responsibilities: readings and lecture preparations for the course I’m taking; marking, office hours, and lab hours for CS342; organizing the upcoming internal CS conference, UWORCS; and keeping up with the usual political and governing bodies on campus, namely SOGS, CSGSC, GEC, and the department chair selection process. I also have sort out all my receipts from SOFSEM to claim the travel expenses: currently all my receipts are basically in a number of giant balls of papery mess.

I also have to try and keep up with my research. Currently I have three things on my plate as far as research is concerned. The first is to keep in touch with the aforementioned Matej to see how applicable my research will be to the noble goal of robust operating systems. I’ve been reading up on Pict, a language based on the π-calculus, for the past few days and have just sent him another email to see what can be done. The research here seems overwhelmingly practical and I’m less inclined to spend a lot of time coming up nice theoretical models. I think it would be more prudent to come up with a quick addition to the Pict syntax to say that it works. But it does have a lot of potential down the road, perhaps, for a better foundation.

Another road of research, which I’ve spent my office hours this morning doing, is coming up with a compiler for my language, Ca. Incidentally, I now realize I should have spent more time coming up with a good name for it.

I spent this morning writing my own copy collector in C. I like the idea of compiling the language down to C. I also like the idea of having the data structures totally under my control, as my research naturally leads away from garbage collection towards other memory management schemes like regions, or some hybrid. Further, I want my garbage collector to be precise, and it’s hard to get away from conservative garbage collectors in C. For static analysis purposes, it’s also nice to have complete control over the semantics of the memory manager. And, for the purposes of writing short papers, it’s nice to have a memory manager with very simple semantics haha.

Like I said, it’s a copy collector, a rather simple one. No generations or concurrency or speculation or anything like that. Object overhead is 8 bytes right now, though hopefully that will come down. It handles unboxed objects nicely, and precisely.

The downside is that, after much time fiddling with setjmp(), getcontext(), even writing hand-coded assembly, I’ve given up a nice universal way to derive roots, so code has to explicitly use add_root() and remove_root(). Seeing as the code will be automatically generated by the compiler, I’m hoping this won’t be too onerous, but we’ll see.

I suppose everyone’s different, but for me the memory management is the hardest part, so I’m expecting the rest to be mostly grunt work.

After I get the compiler stable enough that I can play around with it a bit, I can start looking at developing the algorithms to actually give some nice bounds on it statically. We shall see.

I’ve mentioned before my CS888 project, which is taking GHC‘s garbage collector and adding real-time hooks into it. There will be two functions exported on the Haskell level: Control.RealTime.gcNextCollectionUpperBound :: IO Int and Control.RealTime.gcBound :: Int -> IO a -> IO a. The latter function is “where the magic happens” (after Mark pointed out that I used this phrase 3 times in my comments in my CS833 project source, I realize I use it all the time). Indeed the former function isn’t even necessary, but is useful for making garbage collection more predictable. But the latter function is used to force the garbage collector to limit itself during some chunk of code and not go off on some tangent collecting, behaviour which would be disastrous in real-time code. Note that gcBounds can be nested.

I bolded the word “force” up there, because I want to stay hard real-time if at all possible. If the user wants soft and I provide hard, that’s just a bonus. If the user wants hard and I proved soft, that’s no good.

The more I think about gcNextCollectionUpperBound, the tricker it gets. In case its name is not clear, its job is to estimate how long it thinks the next round of garbage collection will take, in milliseconds. This function should be fairly quick—it shouldn’t simulate a collection just to figure out how much garbage there is; it should be moderate accurate—hopefully within an order of magnitude in most cases; and it should return a hard upper bound, such that it is impossible for garbage collection to take more than the number returned.

So, consider the worst case scenario for garbage collection. The worst case is that garbage collection will have to swap in, from disk, the objects that it’s collecting. In the worst case, pages will be scattered all throughout, such that it will have to do a swap for every page. Thus, if we assume an 8ms seek time on the hard drive, the hard upper bound is 8p, where p is the number of pages to be collected. Then you think: maybe even assuming an 8ms seek time isn’t being hard enough; maybe the user is running this real-time system with a network-mounted swap partition, and the swap file is actually on Mars. Then the hard upper bound is something like 2400000p. Suddenly it seems obvious that getting a bound within an order of magnitude—one of the properties I wanted for gcNextCollectionUpperBound is looking hopeless.

Probably I will have to abandon my dream of returning a hard upper bound. This isn’t the end of the world: as long as gcBound provides a hard guarantee, that’s all that matters in the end. Probably what I will do to get a soft upper bound is:

  1. set up some compile-time latency constants: something like 8ms for swap, 1μs for RAM, 1ns for cache;
  2. come up with some arbitrary hand-wavish heuristics for semi-conservatively guessing when a page is going to come up from one of those 3 locations;
  3. look at total memory usage and start multiplying page numbers by constants.

It’s point #2 that’s the hard one, and I’m sceptical as to how much I’ll be able to glean about the structure of the heap without actually diving into it.

This is sadly the only research I’ve found on the subject.

Haha I found a fairly informative, if sloppily written, M.Sc. thesis on my current research interest du jour—static reasoning of memory usage and memory structure. This quote somehow both saddened and excited me:

Though there is early work to be found in the 1970’s, the progress on memory aware reasoning has been surprisingly slow as [Reynolds, 2002] notes:

It can be discouraging to go back and read the “future directions” section of old papers, and to realize how few of the future directions have been successfully pursued. It will be fortunate if half of the ideas and suggestions in this section bear fruit. In the meantime, however, the field is young, the game’s afoot, and the possibilities are tantalizing.

I suppose it shouldn’t be terribly surprising. Programming language research can be a bit tricky at times, what with undecidability and all. Researching a fruitless area has the potential to provide something beyond incremental footnotes. It also has the potential to offer a ten-year Ph.D. with no publications. Which one will it be?

A Google search for real-time haskell yields little of interest, which I suppose isn’t too surprising. Haskell, if you’re not familiar, is not just a pure functional language, but a lazy one. On the surface, this makes it a dreadfully poor choice for a couple of reasons.

  1. Haskell code tends to be slow. I’ve started using the Gentoo language shootout as my baseline for language comparisons—it seems fairly unbiased if nothing else—and with a few exceptions, Haskell just doesn’t stack up well. This isn’t too surprising: laziness adds overhead. My mentioning performance might get the hardcore real-timers in a tiff—theoretically performance has nothing to do with real-time—but in the real world, real-time is typically either done in embedded situations where hardware resources are scarce, or else we’re in a soft real-time situation playing pr0n, and speed is important.
  2. Haskell code is unpredictable. This is the biggest flaw in Haskell that Simon Peyton-Jones has offered (that I’m aware of). When tracing through Java code, it’s trivial to derive complexity, and even a rough gauge on absolute running time, just by glancing at the code. Laziness takes this all away. I’ll talk about this in a later post perhaps.
  3. Haskell is garbage collected.

It’s that last point I want to talk about. Garbage collection is typically not used in real-time systems, particularly hard real-time systems, for good reason. Imagine writing the quintessential real-time example code—nuclear power plant code; yes, I can dream—and you’ve reserved your time slice, grabbed your mutex, and then the garbage collector decides to run for 400ms. Not cool.

For my CS 888 project, I think I’m going to propose changing Haskell’s garbage collector to provide a real-time interface. Something along the lines of:
gcNextCollectionUpperBound :: IO Int
gcBound :: Int -> IO a -> IO a

The first function would give an upper bound on how long the next round of garbage collection would take (in milliseconds). The second function/combinator would execute a section of code, guaranteeing a bound on the amount of time garbage collection will take, such as:
do  gcTime <- gcNextCollectionUpperBound
    return $ gcBound (gcTime + 200) do
        requestTimeSlice (gcTime + 200)
        mutex <- getMutex
        giveUpMutex mutex
        return answer

This way we’re guaranteed that we complete saveLives before the garbage collector eats up all our time. So that’s about all I need for the interface; all I need now is the implementation. Yay, I’m half done!

GHC’s documentation on its garbage collector is a big weak, so I’ll have to dig into the source. My understanding through a bit of reading is that right now GHC uses a hybrid system, with almost all programs using the copy collector exclusively, and the other part of the system being a generational collector. This is somewhat exciting because I haven’t found any papers yet discussing real-time copy collector systems yet.

We’ll see how far I get in this. The nice thing about a project like this is that a naïve implementation can be done hopefully without too much work (do garbage collecting until you run out of time, then just give up). Even coming up with a naïve upper bound is easy (return INT_MAX). But the project offers itself to putting effort into a more finessed solution, too.

I don’t know, we’ll see what Watt says about it.