Last week, my co-supervisor gave me the Steam Boiler Problem. It’s a classical problem for specifications languages. You have a computer monitoring a steam boiler: a giant tank of water with a heat source and a steam pipe at the top. There are water pumps to control—to add more water into the tank—and a series of sensors. Essentially, given complex specifications, you have to get the steam boiler into a functioning state and keep it that way. Sensors can start malfunctioning, too, to make it more difficult.

It’s a good problem because it forced me to look at how Ca is to work with in the flesh, in a complex example. Even though I’m not done, the conclusion is that it needs a lot of syntactic sugar built on top of it.

Some of them are rather superficial, if time-consuming. For example, currently patterns cannot overlap at all in Ca, so something like:

l {
  Cons 5 xs -> ...;
  Cons x xs -> ...;
  _ -> ...;
};

Is not allowed, though it really should be. I had no idea before now how irritating it is not having that feature. There are other minor syntax features like that that need to be addressed.

The more interesting case is what to do with the catamorphisms. Theoretically they work fine, almost. In practice, they’re a bit cumbersome. When I said “almost” before, I meant they worked fine up to Loïc Colson’s famous inférieur (”minimum”) problem. It pertains to primitive recursive schemes like the one I’m working with, were writing the “minimum function”—finding the minimum of two values—has greater time complexity than in an unrestricted computational model.

What this boils down to in my mind is the inability to perform a catamorphism over two objects simultaneously. Or stated in a more Haskell-ish way, there’s no efficient way to write the zip function.

One solution floating in my mind is to make the catamorphism construct explicitly allow zipping. It’s not too bad adding it in. The syntax would be something like so:

(l, n) {
  (Cons x xs, p+1) -> ...;
  (Nil, 0) -> ...;
};

Where l is a list and n is a natural number.

Another possible solution, which I quite like, is the idea of making catamorphisms parametric. So you could do something like so:

{ i ->
  Cons x xs -> i * x + @xs x;
  Nil -> i - 2;
} l 0;

The variable i is a parameter over the catamorphism, initially 0. I like this is as it makes a lot of things more natural and doesn’t add any computational power. The syntax for the catamorphism has to change, but I think this is for the best anyway.

I’ve been thinking about the compiler again and how best to implement higher-order functions, once time comes to do that. Currently the compiler uses a lambda lifter. I.e., there are no closures. Every function is lifted out of any scope to the top-level, collecting extra parameters along the way. This is fine for first-order programs, since you can change every call to a nested function into a call with extra arguments. However, in first order programs, this causes problems because the arities have to match. For instance:

f x = let g y = x + y; in
  g (x + 4);

Gets transformed into:

f_g x y = x + y;
f x = f_g x (x + 4);

Every function is now at the global level and g has all the variables it needs. This causes problems with higher-order functions though:

f x zs = let addX y = x + y; in
  map addX zs;

Naïvely this is:

f_addX x y = x + y;
f x zs = map (f_addX x) zs;

Ahh, but the first argument to map is a closure, not a function! We should properly implement closures as lambda lifted functions, so we can’t very well implement lambda lifted functions are closures! There are myriad other problems with this approach, but suffice it to say it won’t work.

Anyway, I was contemplating abandoning lambda lifting entirely and just leaving closures implemented as closures in C. The GNU C Compiler has a little-known and little-used, sadly, feature usually called nested functions. They’re half-closures in a way. They’re closures up until their lexical scope disappears. Well, such is the C way. But they would do what I needed.

I was reading the original 1988 Usenix paper which describes the technique, usually called “trampolining,” to implement nested functions quite smartly in C. As I was reading, I discovered that the author had originally intended to implement unnamed functions in C!

He gives two suggestions for syntax. I like the first one he suggests, which looks something like (int (*)(int x) { return x + 2; }). For instance, you could do something like (int (*)(int x) { return x + 2; })(3), which would evaluate to 5. Perhaps not the most useful example, but you get the idea.

These unnamed functions are really no more of a bother to implement than named functions, once you allow nested functions, but it appears this feature was never implemented in GCC! I suppose to get good use out of them you’d need a less bulky syntax. Type inference could help there.

Anyway, the point of this post was to register my dismay at GCC never implementing unnamed functions, despite there being no technical reason for not doing so. They can be quite handy, those unnamed functions.

For what it’s worth, I think I’ve decided against relying on GCC’s trampolining to do higher-order functions and lambda abstractions in Ca. It is very clever and very efficient, but the semantics are a bit hairy for the 21st century. For example, I just looked at some SPARC assembler that GCC produced for it, and it involves a system call to make to the stack executable. In today’s security conscious kernels, executing the stack, as trampolining necessarily requires, is less practical than it once was. Maybe one day we can get rid of these blunt MMUs, but that’s a rant for another time. In any case, MMU concerns aside, executing the stack mucks with the instruction cache on architectures with separate caches. As I said, hairy semantics.

In Breuel’s paper, he sets up other simpler solutions for implementing closures as being impractical due to their changing calling conventions. Well, I’m making my own language and I can make my own calling conventions. I expect that higher-order functions will simply be passed as two pointers instead of one. It saves everyone a lot of headaches.

The functional programming language community seems to be moderately abuzz over the recent release of the Disciplined Disciple Compiler. The first announcement I saw for it I ignored almost completely. Dialect of Haskell, allows destructive updates, allows mixing lazy and strict, etc. None of that really grabbed me.

Then this afternoon I saw another blog mentioned it on the Planet Haskell feed. This time it was mentioned with the phrase “region, effect, and closure inference” and immediately I had to read more about it.

The compiler seems to be a Google summer of code project by an Australian Ph.D. student. There is no paper yet! This means trying to figure out what’s going on involves combing through all the wiki pages and reading some tea leaves. Regions—à la Tofte—are not used for memory allocation. Darn! They seem to be used as part of the effect system, e.g., to mark certain regions of memory as being pure/constant/read-only or allowing destructive updates.

All of the magic is in the type system. It looks very well thought-out and very rich and is all orthogonal to the usual Haskell type system.

It looks like an interesting language to follow. I shall be keeping track of its developments.

From a syntactic stand-point, it seems to further my belief that you need to choose a “side” on the strict versus lazy issue. Disciple should in theory nicely and transparently mix strictly evaluated expressions with lazily evaluated ones. In practice this means things are strict by default but can be made lazy with annotations.

In my mind, what defines a language as being “strict” or “lazy” is the standard library. A language which truly is agnostic on the issue should allow, for example, map to be used in either a strict or lazy manner rather than having both a strict map and a lazy map. I don’t see an easy way to get this, so for the time being it seems languages have to pick sides.

In a similar vein, one of the nice features of Disciple is that its effect system ends the duality in Haskell between pure and impure functions. For instance, in Haskell you have a map which works on pure functions and a mapM which works on monadic functions. No such division exists in Disciple.

I stumbled upon this paper, recently presented at SIGCSE 2008, authored by Stuart Reges. It deals with trying to determine what exactly it is that causes you to “get” computer science. This quote from the introductory session of the paper I thought was pretty good:

Computer Science educators have for years complained that introductory courses seem to be divided between a group of students who “get it” and a group of students who do not. Donald Knuth has written about this phenomenon:
“Educators of computer science have repeatedly observed that only about 2 out of every 100 students enrolling in introductory programming classes really resonate with the subject and seem to be natural-born computer scientists…I conclude that roughly 2% of all people ‘think algorithmically,’ in the sense that they can reason rapidly about algorithmic processes.” [5]

Many computer science education researchers have been exploring the question of what factors predict success in CS1 and CS2 [2, 7]. In fact, Knuth was motivated by just such a study. It was an unpublished study conducted by Gerrit DeYoung in which he found that a measure of quantitative reasoning was not a predictor of success in a course for CS majors but was a reasonable predictor of success in a course for nonmajors. Knuth’s tentative conclusion was that there is some kind of CS aptitude that is not measured by standard tests of quantitative reasoning and that students who lacked that ability were instead relying on general quantitative aptitude.

This certainly sounds about right, and I find it interesting that people are using statistical analysis to figure this out. Anyway it’s really a brilliant paper. I wish I could quote it all, but that would rather defeat the purpose of quoting it.

It looks at aptitude tests for advanced placement in computer science. The punchline is that there were five “powerhouse” questions on the test which predicted success on the test fantastically well.

The most powerful “powerhouse” question was: given a boolean variable b, what does the statement b := (b = false) accomplish? (Side note: the aptitude test was given in Pascal. The equivalent C statement would have been b = b == 0;.)

It seems simple at first, but the more I thought about it, the more it astonished me. Despite the broad range of topics covered by the test—recursion, iterating, manipulating lists and other ADTs, arrays, et cetera—it seems that a huge part of core computer science comes down to whether you “get” that b := (b = false) is just a not operation.

Of course the way I’ve presented it, I’m exaggerating the effect of one question. According to the paper, 60% of students got that question right. Even if we account for the fact that the people writing the test are far from representative, that’s still a long ways from the 2% figure that Donald Knuth gave us.

Since the majority of my teaching career thus far has been with first year students, it’s pretty interesting to read about why some students “get it” and some don’t, and what might be done about it. There’s a well-known phenomenon here, at least, of people who have done extremely well in the typical sciences—math, physics, biology, chemistry, etc.—and then get absolutely destroyed by first year computer science. And, of course, the converse is true as well. Despite the superficial similarities in skill sets between computer science and other scientific pursuits, there seems to be an extra quality at work.

The other side of the coin is that many of the students who “get it” are disinterested in it. Sigh.

I’ve also been given thought to the future of programming languages. In my humble opinion, the future programming language will deftly handle two upcoming issues: parallelism and weak references. Unfortunately these are also two issues which are not at all handled by my language.

Parallelism probably isn’t an “upcoming” issue so much as something we’re dealing with right now, but still it’s still not being handled by current languages, so i’ll consider it upcoming anyway. Haskell is, to my knowledge, the only language which is trying to attack parallelism in an interesting way. It provides primitive language constructs which are easy to glue together, hides away the implementation details, and, being Haskell, gives you a lot of nice static guarantees. For example, in Haskell’s STM, it’s impossible to reach deadlock. Sadly it is not very efficient, but it’s nice to see it being thought about. Maybe some day soon someone smart will make it practical.

Weak references are things which every modern language handles and handles extremely poorly. Functional languages are ideally set up to handle weak references intelligently, because in principle they don’t make any distinction between data and code, but they don’t do much with them right now. My dream operating system is basically built entirely around the idea of weak references. Why is it that operating systems can cache data, but they can’t cache the execution of a process?

Time Machine is a nice first step at the operating system. For those who’ve used it, I’m sure they might agree with me that it’s more than just a snapshot back-up system with a pretty 3D interface. It is, in my mind, weak references for the file system, which is exactly what I’ve wanted from a filesystem for so long. Basically it allows you a way to mark files as “I need this to stay in memory” versus “it would be nice if you could keep this in memory”, i.e., a weak reference. Since switching to Time Machine, I handle everything on my system very differently. In my address book, my emails, my bookmarks, my list of papers to read, etc. etc. I am constantly distinguishing between “must keep” versus “would be nice if it stuck around”. With Time Machine you distinguish between the two cases by deleting or not deleting something, knowing that once you delete something it’ll probably hang around for a few years still as a weak reference.

So at the programming language level, functional languages are in a nice position to handle weak references. In Haskell, for example, you can create a weak object by handing it the code to create it. Once that code is executed, the resulting data hangs around as a weak object until such time memory needs to be reclaimed. If you decide later on that you needed that data again, no problem, just re-execute the code! The problem is that there is a lot of strategy involved in deciding which weak references to throw out and which to keep.

At this point, no matter what language you use, it seems it’s up to you to keep track of your weak references manually. It reminds me of the days when you managed your memory manually: it worked fantastically as long as your code wasn’t complex, ha!

The preliminary compiler for Ca is nearing usability right now. The back-end—i.e., the garbage collector, code generator and lambda lifter—should be stable now, though I have recently had to make minor corrections to all three of those yesterday. The parser also should be stable. What’s in the middle is always the most unholy component of any compiler I write, the typing system.

I don’t think it’s because I’m particularly stupid; I think typing is just difficult. Even in type unification, which is one of the simplest and most concise of the typing components. I spent a good hour yesterday tracking down a type unification bug. And then there is type inference. Save for one bug, to be described below, I think type inference is finally working solidly and reliably. But it will require a massive clean-up before it’s can be called “literate.”

Hopefully tonight I will have a web interface up such that people can play around with Ca code to see how catamorphisms actually look in C. In the mean time, I have but one serious bug left to squish in type inference.

Writing the compiler was incredibly useful, and of course necessary in the long run, as the language is much more concrete in my mind, rather than writing the theory and in my mind thinking “well that will all work out.” There are, however, two issues that need addressing:

  1. Ca needs higher-order values and lambda abstraction at some point. I just don’t think a functional language is viable without them. I’m also quite confident that higher-order values can be restricted in a way to avoid “the Loïc Colson” problem where it necessarily has to blow up the computational power of the language beyond primitive recursion.
  2. I haven’t proven it, but it occurred to me that likely it is impossible to write a Ca program which runs in Θ(lg n) time, or in fact in any complexity that involves logs. The problem stems from the fact that in a divide-and-conquer approach, we recurse on both halves immediately, eliminating any advantage of the “divide.” For example, consider a natural binary search in Ca:
    data SearchTree k v
      = Leaf
      | Node (SearchTree k v) k v (SearchTree k v);
    find x t = t {
      Leaf -> Nothing;  -- failure
      Node left key value right -> if key == x then
          Just value  -- found it!
        else if key < x then
          @left
        else
          @right;
    };

    It looks reasonable: @left and @right stand for recursing on the left and right branches, respectively, so it seems like this is exactly what we want to do. The problem is that, currently, @left and @right are computed up front: before the if expression ever gets a chance to be evaluated! This means both the left and right branches are recursed on in all cases. So, we visit the entire tree.

    This has two immediate solutions. The first is to introduce laziness. General laziness is a pain and so I don’t want this. I will talk about this in the next paragraph, however. The second solution is to compute @whatever only when it is used and then rely on people using let expressions to avoid duplication of work. This works, but I’m not entirely convinced at this point that this is the route I will go down.

    A common sub-expression (CSE) removal optimization would, in my mind, be the ideal solution. In other words, the programmer is allowed to use @whatever variables wherever he or she likes, and the compiler statically pushes off computing the variable as long as possible and then wrapping it in a let.

    At first blush, this probably seems needlessly complicated the semantics of the language. However, if the goal of the language is to statically prove time complexity, etc., then it seems that the work it would have to do to prove complexity is the same as the work it would have to do to do the CSE removal I described above. But at the least, as a stop-gap, I should probably change the semantics such that the computation of @whatever is put off until the variable is used, even if the absence of a CSE removal optimization.

    Of course all of this is ignoring the fact that for Ca to prove that binary search is in logarithmic time, it would have to prove that the tree is balanced, definitely no small feat!

I’ve done more work on the Ca compiler. The to do list now looks as:

  1. garbage collector (finished)
  2. code generation (finished)
  3. lambda lifter (finished)
  4. general code clean-up (in progress; see below)
  5. type inference
  6. parser (finished but weakly tested)

I just finished the parser about half an hour ago. Rudimentary testing shows that it’s working correctly.

As a short side note, since the compiler is written in Haskell, I’m using Happy as the parser generator, a rather typical LALR(1) parser generator, with options for GLR(1) parsing which don’t interest me. The error messages in Happy are both good and bad. They’re bad in that they reference the generated code instead of the code in the grammar file itself. They’re good in that they’re so, well, happy! Consider this snippet from a single error message that was over 200 lines long:

In the second argument of `happySpecReduce_3′, namely `happyReduction_17′
In the expression: happySpecReduce_3 11 happyReduction_17

Maybe it’s just me, but somehow knowing that the problem was with a happyReduction doesn’t make it seem so bad.

Sitting down with a concrete LALR(1) grammar has made me make a few changes to the syntax, as compared to what was published in the paper. For starters, there are a few more semicolons thrown in here and there to mark where one thing starts and another ends. Some syntax issues that I’ve been pondering lately:

  • Type constraints. Since Ca is aimed deliberately at static analysis, I want constraints to go beyond just type constraints. Of course it’s supposed to rely primarily on inference, not explicitly annotated constraints, but constraints are handy. So in Haskell one might write f :: Integer -> Integer to say that the function f is constrained to taking integers as input and producing integers as output. For Ca, we might like, for example, time constraints in addition to type constraints. See the syntax example below for how this might look.
  • Infix functions. For whatever reason, user-defined infix functions, sometimes called operators, are a staple in functional programming, and I like to go with the flow. Why limit yourself to expressions like a + b * c when you can have a +^$$ b &&|| c? Unfortunately this causes a problem with generating C code, as I promised myself I’d limit myself to as little name-mangling as possible. There’s no way around it, though, and for the time being, +^$$ is mangled into the C function OP_plus_caret_dollar_dollar. Infix constructors aren’t in yet but should be some day.
  • Locally scoped data type definitions are in, which I’m quite pleased with. This allows you to write such things as z where z = f A; f = …; data X = A | B; or some such, where the data type X exists only inside that where clause.

As for the syntax itself, the only “crazy” part that I’ve added is the syntax for constraints. Basically you can add any number of constraints to a function, and each one is tagged with a constraint type. Currently only type constraints are allowed, but time and memory, etc., constraints should hopefully come in the next few years. For example:

sumList
    :(type): [Nat] -> Nat;
    :(time): n -> Theta(n);  — doesn’t exist yet
x = x {
    Nil -> 0;
    Cons x xs -> x + @xs;
};

For the time being, :: on its own defaults to :(type):. I don’t know if I’m 100% satisfied with that syntax, but it seems workable.

As for the code cleanup, if you read the last post I made, you’ll know I’m trying to make this code as literate as possible. I’ve been in touch with Joachim Schrod, the author of CWEB-latex, as I’ve decided LaTeX is the best way to go, what with its nice hierarchical document structure and whatnot. CWEB-latex hasn’t been maintained in 13 years and neither I nor Joachim know of anyone using it, but nonetheless he sent me an updated class file which will hopefully work for me. The 1995 release of CWEB-latex almost works with modern LaTeX distributions, but not quite.

Progress on the Ca compiler is going well. The lambda lifter is done. I really couldn’t be much more pleased about how things are going.

I decided now would be a good time to clean up and document the code. I’m a big fan of literate programming and so have been investigating literate programming tools. No matter how long your multi-line comments, I find documentation inside code to be completely useless for getting “big picture” concepts from source code. Literate programming seems to be a good solution.

The idea behind literate programming is that you have one source file. This source file is both a typeset document source file and a programming language source file. For example, through Donald Knuth’s CWEB tool, you write a foo.w file which becomes both a foo.tex file and a foo.c file.

My experience with literate programming up until this weekend has been a bit of a lie. It was with Literate Haskell, which Wikipedia—I now realize correctly—says is semi-literate, not literate. In Literate Haskell, basically you interleave LaTeX code and Haskell code, and the Haskell compiler is clever enough to only compile the Haskell code. It’s cute and it worked well enough to do my thesis, but it’s not really literate programming.

If you’ve never sat down and played with Knuth’s WEB or CWEB, I would highly recommend you do so. Knuth’s vision of literate programming is not just to interleave documentation and code, but, to borrow the names of Knuth’s own tools—to weave and tangle them together. Documentation and code are not interleaved; they’re indistinguishable, almost. CWEB gives you the freedom to refactor and reorder your code in ways that make it look totally unlike something intelligible by a C compiler, to make it look like literature—quite a feat for C—but to give you a valid program at the end.

In a sense it’s not really fair to compare Literate Haskell with CWEB, as they’re working in different domains. For example, one of the nice things about CWEB is that it lets you reorder code. This doesn’t exist in the world of Haskell, since Haskell never really had any substantial restrictions on the reordering of code in the first place.

Anyway, my Ca compiler cac is currently made out of two languages: Haskell and C. Eventually I’ll have to find a way to deal with its grammar as well, but I’ll cross that bridge later. Literate Haskell works well enough for the Haskell side of things. It’s quite inoffensive; you can structure your LaTeX code however you like.

For the C side of things, I’ve spent most of today installing tools, reading their documentation, playing with them, trying to get them to work. I played with noweb and nuweb, but neither seemed too impressive. Once I installed CWEB, things improved a bit.

CWEB really is just a flat-out brilliantly written piece of software. It does exactly what you want it do, a whole lot more, all simply, logically, and robustly. It weaves together documentation and code beautifully. There is only one problem with it: it produces Plain TeX code. I don’t know Plain TeX.

No problem; there is a software out there called latex-CWEB which is a LaTeX class that will render CWEB documentation. Note, I said class. So far as I can tell, this means any document you create with latex-CWEB documentation cannot be an article; it cannot be a report; it cannot be a book; it cannot be any type of document other than a “cweb” document. This is a bit restrictive if you consider that you might want a document which contains things other than CWEB documentation. Such as Literate Haskell documentation.

Since, as I said before, Literate Haskell is so inoffensive, it probably will be workable, and it might turn out to be the best option.

The other competing option is to try and learn Plain TeX. I’d never considered it before; I was one of those “why use TeX when we have LaTeX?” kind of people. But my opinion is changing a bit after seeing how CWEB renders my humble C code: it’s gorgeous.

One final note before I head off to bed. It’s frustratingly come to my attention all throughout today just how marginalized literate programming is. I did some Google searches for a “literate IDE”. Needless to say I didn’t find anything. It really is a shame. Literate programming is a fantastic idea, but all we have now is a smattering of simple mostly one-off tools, almost none of them having been maintained since the mid 1990s, all incompatible, all structuring things in different ways. The state of the art right now is that it really is a big hassle to write literate code unless you either consign yourself to second-rate documentation or else commit yourself to exactly one language with exactly one tool.

The Ca compiler I mentioned one week ago, which I’ve dubbed cac, is progressing well. The core of the back-end is effectively done. Here’s the current to do list:

  1. garbage collector (finished)
  2. code generation (finished)
  3. lambda lifter
  4. general code clean-up, as it will no doubt be past due at this point
  5. type inference
  6. parser

So far I’m fairly pleased with my progress, though that always seems to happen when I get into coding. It makes me half-wonder why I went into research when coding is always so much more productive, ha!

The compiler is written in Haskell and targets C. The garbage collector is all written in C. Nothing about the interface to the Ca run-time/garbage collector is unportable, but currently the primary data structure is dependent on an LP32—i.e., longs and pointers are 32 bits in size—architecture. I will probably provide an implementation for LP64 systems as well, as well as a totally portable implementation, which makes no assumptions violating the C99 standard. As it’s only one struct which needs to be rewritten, it won’t take much time. The current implementation, for LP32 systems, allows arbitrary-sized boxed values as well as 31-bit unboxed values. Currently 31-bit natural numbers, i.e., unsigned longs, are the only unboxed values supported.

The rest of the garbage collector is portable. I even used malloc() to set up the heaps. Since my memory pools are of fixed size right now, there’s no downside to using malloc(), though I might have to abandon it if I ever go to dynamically-sized pools.

I’m probably most proud of my code generation. Since Ca is not a general purpose language, interfacing with other languages is in the back of my mind. And, of course, when I say “other languages”, I implicitly mean “C”, since every language’s foreign function interface is C if nothing else.

So, if, in Ca, you define a function called foo, the generated code will create a C function of exactly the same name foo. This means that if you’re calling Ca code from C, you don’t have to worry about name mangling, or really importing function prototypes at all.

Further, all unboxed return types and parameter types in the generated code have type unsigned long. For example, if you write the Ca function foo x y = x { Zero -> y; Succ p -> @p * y; }, where x, y and the return value are all natural numbers, the prototype for calling this function from C is exactly unsigned long foo(unsigned long, unsigned long). Indeed function applications in Ca are nothing but C function calls. Quite a nice property, I think.

For boxed values, of course, things get a little bit more complicated when doing a foreign function interface. The Ca run-time will most definitely have to provide service functions to convert, for example, arrays into boxed lists. I don’t think there’s any way around this.

The C code generated is moderately readable, in the sense that whitespace is all proper and some variable names are meaningful. E.g., if you have a Ca variable blarg, this will show up as var_blarg in the generated C code. Hopefully this will make debugging easier in long run. Unfortunately the generated C code uses a lot of temporary variables, and the comments it produces are rather sparse.

Catamorphisms over natural numbers are implemented using for loops. Catamorphisms over other data types are implemented using recursion.

I’d almost forgotten how much fun it is to implement a compiler. While I was tagging the germane categories for this post, I almost instinctively checked off “hobby”. Good times. I think I’ll let it rest for a couple days and focus on other research. Though I won’t feel completely accomplished until I get my computer and hooked up to the external hard drive for back-up, ha!

I regularly skim through digests of comp.risks, a newsgroup devoted to the risks of computing, as it’s directly related to my research and, above all, pretty entertaining. I was recently catching up, and the subject line “A risk of static analysis tools — and refereeing” understandably caught my eye. I reproduce it shamelessly below without attribution, as I don’t know its original source, though it’s included in comp.risks 25.02 (January 14, 2008):

Interesting anecdote: Some years ago a simple static code analysis tool was submitted to a conference. Two of the three reviewers, both of whom were extremely careful programmers, ran it on their own code just to see what would happen.

The tool produced 100% false positives (not 95, not 99, but fully 100%). As a result, the paper wasn’t accepted.

The same paper was later submitted to another conference (where reviewers didn’t try this), where it was accepted and won the best paper award.

Next Page »