I apologize for not posting more often to this blog. It is, of course, the usual excuse—I always find more important things to do. In fact I’m still terribly busy, but I thought I was overdue.

I got back from working with Robin Cockett and Brian Redmond in Calgary. They’re working on a new programming language which uses type theory to enforce that each well-typed program halts in polynomial time. It’s been done before, but nowhere near as nicely as this. The implementation is just a proof-of-concept, nowhere near fit for consumption of a general computer science audience. It was this fact that it’s built on a terribly recondite foundation mixed with discussions over the minutiæ of the syntax got me thinking more about the face of a programming language, the dreaded “front end.”

Syntax in the programming languages world has, to a large extent, been considered a solved problem for the past couple decades, and for good reason. The technology behind parsing has, so far as I’m concerned, been solved. There have been enough languages created, and segregated into successes and failures, that people have a general idea of what a decent programming language should look like. Yet the process is still not formalized. When creating a new language, people—or myself at least—look at existing successful languages, fudge them to fit whatever new characteristics their language will contain, and use common sense to reason it out.

I think I’ve stated on here before my fondness for literate programming. I think code should be primarily documentation and, further, that that documentation should look gorgeous and be easy to read. As I was mulling over programming language syntaxes, I hit an epiphany when I realized that the primary goal in the syntax of a language should be how readable it is when typeset. As we move towards higher-level languages and rely on smarter compilers to do what we want to do, the purpose of a language is no longer to describe exactly which bits gets pushed where, but rather to, in the most elegant way possible, describe the essence of the solution to the problem.

I like the off-side rule—the “significant whitespace” present in Python and Haskell—for exactly this reason. When displaying code that’s meant to be read, it’s imperative that the code be indented nicely anyway. Why not have the language enforce that the program’s structure be identical to how it’s presented?

This got me thinking about syntax highlighting. Highlighting is nearly as important to indentation when understanding the structure of code, in my view. Further, when typesetting literate code, that code should properly be highlighted in the typeset document. So why not make highlighting part of the language? One could distinguish between the highlighted keyword let and the variable name let. It poses some practical problems, namely that you potentially have to abandon straight ASCII as an input format in favour of annotated ASCII such as LaTeX or Rich Text.

The other thing I started thinking about is syntax in the context of information theory. This got in my head as we were discussing the role of keywords such as of in the new language. Should the syntax dictate that you write case x of … or should it leave out the entirely unnecessary keyword of there?

In the interest of maintainable code, there’s value in redundancy. As two extremes: one can write 50 characters of C++ code and, due to the noisiness of C++’s template syntax, relay only 50 bits of actual information; however, 50 characters of Lisp code might relay 200 bits of information. In this case I like Lisp’s syntax better than C++’s, but there’s a huge danger in having a syntax which is too concise: one parenthesis in the wrong spot and the semantics of your program can be completely different from what you intended. And that difference might not be obvious to the reader.

I had this conversation years ago with someone about the operator—used both in programming and English—”not.” Note that whenever someone feels they must get across the point that they are not something, they will emphasize—using italics or boldface or uppercase letters or anything at their disposal—the word “not” and often only the word “not.” Programmers typically don’t have such luxury, and an errant ! symbol can sometimes get lost in the mix. Yet it is one of the most powerful constructs in computer science: that you want to do exactly the opposite of what you’re going to describe next. From the point-of-view of information theory, A and not A are syntactically almost identical.

So that is a clear example of where it would be nice for the language to enforce some redundancy. Don’t just throw a ! in there, but redundantly make other changes to the code that make it clear that something’s going on.

The obvious question is: so how would a language do that? An even better question is: how can a language require redundant information without being complete torture to write in? I’ve no idea, but it would be nice if someone would look at it formally, ha!

Advertisements