Progress on the Ca compiler is going well. The lambda lifter is done. I really couldn’t be much more pleased about how things are going.
I decided now would be a good time to clean up and document the code. I’m a big fan of literate programming and so have been investigating literate programming tools. No matter how long your multi-line comments, I find documentation inside code to be completely useless for getting “big picture” concepts from source code. Literate programming seems to be a good solution.
The idea behind literate programming is that you have one source file. This source file is both a typeset document source file and a programming language source file. For example, through Donald Knuth’s CWEB tool, you write a foo.w file which becomes both a foo.tex file and a foo.c file.
My experience with literate programming up until this weekend has been a bit of a lie. It was with Literate Haskell, which Wikipedia—I now realize correctly—says is semi-literate, not literate. In Literate Haskell, basically you interleave LaTeX code and Haskell code, and the Haskell compiler is clever enough to only compile the Haskell code. It’s cute and it worked well enough to do my thesis, but it’s not really literate programming.
If you’ve never sat down and played with Knuth’s WEB or CWEB, I would highly recommend you do so. Knuth’s vision of literate programming is not just to interleave documentation and code, but, to borrow the names of Knuth’s own tools—to weave and tangle them together. Documentation and code are not interleaved; they’re indistinguishable, almost. CWEB gives you the freedom to refactor and reorder your code in ways that make it look totally unlike something intelligible by a C compiler, to make it look like literature—quite a feat for C—but to give you a valid program at the end.
In a sense it’s not really fair to compare Literate Haskell with CWEB, as they’re working in different domains. For example, one of the nice things about CWEB is that it lets you reorder code. This doesn’t exist in the world of Haskell, since Haskell never really had any substantial restrictions on the reordering of code in the first place.
Anyway, my Ca compiler cac is currently made out of two languages: Haskell and C. Eventually I’ll have to find a way to deal with its grammar as well, but I’ll cross that bridge later. Literate Haskell works well enough for the Haskell side of things. It’s quite inoffensive; you can structure your LaTeX code however you like.
For the C side of things, I’ve spent most of today installing tools, reading their documentation, playing with them, trying to get them to work. I played with noweb and nuweb, but neither seemed too impressive. Once I installed CWEB, things improved a bit.
CWEB really is just a flat-out brilliantly written piece of software. It does exactly what you want it do, a whole lot more, all simply, logically, and robustly. It weaves together documentation and code beautifully. There is only one problem with it: it produces Plain TeX code. I don’t know Plain TeX.
No problem; there is a software out there called latex-CWEB which is a LaTeX class that will render CWEB documentation. Note, I said class. So far as I can tell, this means any document you create with latex-CWEB documentation cannot be an article; it cannot be a report; it cannot be a book; it cannot be any type of document other than a “cweb” document. This is a bit restrictive if you consider that you might want a document which contains things other than CWEB documentation. Such as Literate Haskell documentation.
Since, as I said before, Literate Haskell is so inoffensive, it probably will be workable, and it might turn out to be the best option.
The other competing option is to try and learn Plain TeX. I’d never considered it before; I was one of those “why use TeX when we have LaTeX?” kind of people. But my opinion is changing a bit after seeing how CWEB renders my humble C code: it’s gorgeous.
One final note before I head off to bed. It’s frustratingly come to my attention all throughout today just how marginalized literate programming is. I did some Google searches for a “literate IDE”. Needless to say I didn’t find anything. It really is a shame. Literate programming is a fantastic idea, but all we have now is a smattering of simple mostly one-off tools, almost none of them having been maintained since the mid 1990s, all incompatible, all structuring things in different ways. The state of the art right now is that it really is a big hassle to write literate code unless you either consign yourself to second-rate documentation or else commit yourself to exactly one language with exactly one tool.
February 19, 2008 at 11:28 am
I couldn’t agree more with your sentiments on literate programming. I suspect the lack of acceptance is somewhat due to the perception that it is “harder” and “more time-consuming” (i.e. expensive in the short term) to write proper literate code than to just hack out some C++ with terse inline comments.
WEB is, indeed, brilliant. But what else were you expecting? ;) Knuth put an enormous amount of careful thought into its design and it shows.
Plain TeX is well worth learning. LaTeX is fantastic for writing mathematical sciences papers (and for this reason it is my go-to typesetter for most of my work) but the specialization that makes it fit this domain so well sacrifices quite a bit of flexibility.
Ever tried to typeset poetry with LaTeX? Good luck! Plain TeX, on the other hand, can do a brilliant job of this. Ask Helmut to show you some of his poetry some time. (In fact, just ask Helmut about “Plain TeX vs. LaTeX” and I guarantee you a very informative lecture.)
Plain TeX is, in my opinion, the ultimate generalized typesetting engine.
March 1, 2008 at 1:35 pm
[...] for the code cleanup, if you read the last post I made, you’ll know I’m trying to make this code as literate as possible. I’ve been in [...]
March 24, 2008 at 4:45 am
I’m a recent aspiring “fan” of literate programming, and have looked at some of the CWEB examples on Knuth’s web page: http://www-cs-faculty.stanford.edu/~knuth/programs.html.
Right now, that’s a little more involved than what I’m aiming for, which is robust LaTeX-based API/interface documentation for C/C++. I’ve been using doxygen, but am bummed it doesn’t include support for Bibtex and LaTeX’s \cite{} macro. Any chance you might know about an alternative API/interface documentation tool for C/C++ that might do the trick?
March 24, 2008 at 4:53 am
Alternatively, is there a convenient way to do API-only documentation in CWEB?
March 24, 2008 at 7:33 am
Hi Ryan! The requirements for API documentation seem different to me, so I don’t think I would use literate tools for that.
But, you could. CWEB, for example, supports C++ as well as C. If you wanted to use something like CWEB for API documentation, then all you would have to do is make your .h file literate. Of note is that CWEB does not support LaTeX, only Plain TeX. I tried getting CWEB-LaTeX to work and could not. noweb and nuweb are two literate tools which do support LaTeX, though I haven’t tried them.
Like I said, I don’t know if that’s exactly what I’d want out of API documentation. Literate documentation (like with CWEB) is sort of meant to be perused, to be read like literature, like a paper or something like that. For API documentation, often what you want instead is for information to be provided in some clear consistent format where you can jump around easily.
But I suppose it depends on the project. I would have recommended doxygen, though obviously that’s not what you want, so what do I know?
March 24, 2008 at 7:34 am
As an addendum, I just stalked you—well, followed your link—and saw that you’re from Calgary. You didn’t do any of your undergrad at the U of C by chance, did you?