I’ve been thinking about the compiler again and how best to implement higher-order functions, once time comes to do that. Currently the compiler uses a lambda lifter. I.e., there are no closures. Every function is lifted out of any scope to the top-level, collecting extra parameters along the way. This is fine for first-order programs, since you can change every call to a nested function into a call with extra arguments. However, in first order programs, this causes problems because the arities have to match. For instance:

f x = let g y = x + y; in
  g (x + 4);

Gets transformed into:

f_g x y = x + y;
f x = f_g x (x + 4);

Every function is now at the global level and g has all the variables it needs. This causes problems with higher-order functions though:

f x zs = let addX y = x + y; in
  map addX zs;

Naïvely this is:

f_addX x y = x + y;
f x zs = map (f_addX x) zs;

Ahh, but the first argument to map is a closure, not a function! We should properly implement closures as lambda lifted functions, so we can’t very well implement lambda lifted functions are closures! There are myriad other problems with this approach, but suffice it to say it won’t work.

Anyway, I was contemplating abandoning lambda lifting entirely and just leaving closures implemented as closures in C. The GNU C Compiler has a little-known and little-used, sadly, feature usually called nested functions. They’re half-closures in a way. They’re closures up until their lexical scope disappears. Well, such is the C way. But they would do what I needed.

I was reading the original 1988 Usenix paper which describes the technique, usually called “trampolining,” to implement nested functions quite smartly in C. As I was reading, I discovered that the author had originally intended to implement unnamed functions in C!

He gives two suggestions for syntax. I like the first one he suggests, which looks something like (int (*)(int x) { return x + 2; }). For instance, you could do something like (int (*)(int x) { return x + 2; })(3), which would evaluate to 5. Perhaps not the most useful example, but you get the idea.

These unnamed functions are really no more of a bother to implement than named functions, once you allow nested functions, but it appears this feature was never implemented in GCC! I suppose to get good use out of them you’d need a less bulky syntax. Type inference could help there.

Anyway, the point of this post was to register my dismay at GCC never implementing unnamed functions, despite there being no technical reason for not doing so. They can be quite handy, those unnamed functions.

For what it’s worth, I think I’ve decided against relying on GCC’s trampolining to do higher-order functions and lambda abstractions in Ca. It is very clever and very efficient, but the semantics are a bit hairy for the 21st century. For example, I just looked at some SPARC assembler that GCC produced for it, and it involves a system call to make to the stack executable. In today’s security conscious kernels, executing the stack, as trampolining necessarily requires, is less practical than it once was. Maybe one day we can get rid of these blunt MMUs, but that’s a rant for another time. In any case, MMU concerns aside, executing the stack mucks with the instruction cache on architectures with separate caches. As I said, hairy semantics.

In Breuel’s paper, he sets up other simpler solutions for implementing closures as being impractical due to their changing calling conventions. Well, I’m making my own language and I can make my own calling conventions. I expect that higher-order functions will simply be passed as two pointers instead of one. It saves everyone a lot of headaches.

Advertisements