This week I started work on designing and implementing a new operating system. I should say first off that yes, I do have more important stuff to do. I should be reading more papers, or working on my CS888 project, or working on my OGSST application, or cleaning the apartment. But, I can’t help myself.

The operating system is tentatively called Acrospire, as it was the first word in the dictionary I found containing the subword “os”. After telling Jasna the name, her first response was “well, you can always change it later”—not overwhelming support, but I’ll live with it for now. The idea behind it had been floating around in my head for a while, but only now have I started writing code for it. Initially, and for all of its foreseeable future, it will run in simulation, as I don’t feel like writing drivers.

Primarily motivating the operating system is a concept borrowed from the lambda calculus, which is that there is no distinction between code and data. In the lambda calculus, data does not exist; there are only functions. I once tried to explain natural number arithmetic in lambda calculus to my brother as “there’s no such thing as a 4. There are only things which behave like 4s”.

A consequence of this view is that there’s no difference between “real” data and procedural data. Take the problem of a windowing system displaying overlapping windows. There are generally two models to choose from:

  1. the window draws to itself like a miniature screen. The windowing system buffers the window and deals with all overlapping itself. This has the advantage that the interface between the window and windowing system is simpler, but the disadvantage that for simple windows, it is not efficient to buffer the entire window in memory; or
  2. the window only draws regions of itself, when triggered by an exposition event. If a region of the window becomes overlapped, the windowing system sends an event to the window, telling the window to redraw a region of itself. This has the advantage that if the window is very simple to draw, the contents of the window need not be stored, only the procedure to draw it.

The point of the operating system isn’t to solve this problem, and most windowing APIs will allow you to choose between the two. The example is meant to highlight that currently we are forced, when designing or using an API, between choose “real” data and generated data. Does your API take in a pointer to a giant block of data, or does it take in a callback function to generate only a small piece of that data at a time? Both have advantages.

In the lambda calculus, such a distinction does not exist, assuming we are not using a call-by-value reduction strategy. If we’re using a call-by-name or call-by-need (lazy) reduction strategy, then by default all data becomes procedural. From a C point-of-view, it’s like your APIs never take in data directly; they only take in callback functions which generate whatever piece of data is needed. The interface is hidden away, of course, such that it looks like you’re just passing in data.

The reason the interface can be hidden away is because the callback functions to generate the data are prohibited from relying on state. This is usually referred to as referential transparency: functions behave the same way every time, regardless of when or where they’re called from. This has benefits if you’re implementing the API. Consider again the problem of a windowing system, where we’re using the callback system. If we’re the windowing system and part of a window is exposed, typically we will send an event to the window, and the window will redraw part of that window. But, if we know that that redraw function is not dependant on any state, we now have the option of buffering what it has drawn. The logic of choosing between a callback system and a buffering system has now been moved: before the logic had to be in the application; now it is up to the windowing system to decide. The primary gain in this is that we only have one interface between the window and the windowing system, but we still have the advantage of being able to choose between the two different redrawing methods, depending on what we feel is more suitable.

So, extend this idea beyond just redrawing windows. Consider grepping through a file for some pattern. In a sense, you can think of the output of grep as procedurally generated data, and referentially transparent procedurally generated data at that: given a pattern and the contents of some input file, it will give you same the results every time. Like in the windowing system example, we have two choices if we want to use the output of grep: we can store it’s result in a file, such that we’re “buffering” it for further use; or, if the output is long but simple to generate, we can simply rerun grep when want to use its output again. Or, what if the operating system could cache the result of the computation?

The idea extends far beyond just windowing systems and grepping. Any task you do on a computer, much of it is computation that the computer has done before, often with the same input data. The idea is to allow the operating system the ability to cache computation, exactly the way operating systems cache data today.

The way cacheing works today is that you, as a process, make a system call, such as open(). The kernel wakes up and realizes that you want to open a file. It checks to see that file is already loaded into cache; if it is, it avoids reopening it and just returns to you a handle to the file already in memory. We can do the same thing with computation. If you, as a process, want the result of some computation, you make a system call. The kernel wakes up, and checks to see if it’s already done this computation; if it has, it just returns to you the cached result.

So what we need are processes, which I call functions, which have some properties. First of all, they are prohibited from relying on state. To do this, I prohibit functions from making any system calls, except for calling other functions, and managing their own address space. No opening files, no printing to the screen, no finding out what time of day it is. Nothing. Calling other functions, managing address space. That guarantees referential transparency. The second property we would like them to have is laziness. By this I mean that a process should not have to generate its entire output before it finishes.

So we use MMU tricks for this. Currently a process’ address space is sort of broken up into 2 address spaces: the stack, and the heap. They start at opposite ends of the address space and grow together. In the functions that I’m proposing, a process’ address space would be broken up into 4 address spaces: the stack, the local heap, the input pages, and the output pages. The input pages are read-only.

Initially the input pages and output pages are not allocated to the process, such that accessing them will trap to the kernel. So consider we’re writing the grep function. We have two pointers into our input space: a pointer to the pattern, and a pointer to the input string. As soon as we try to dereference one of those pointers, we trap to the operating system. The operating system now has the task of generating data for those input pages. This data could be from a file on file, which it would then read from disk, or it could be a string taken from the user, or it could be procedurally generated data outputted from another function. Functions can be chained together.

So back to our grep process, let’s say we get our pattern string and a page of our input string, and eventually we get enough output data that we filled up an entire page (I think pages are about 4KiB in size on most OSes these days). What you can do, and it is up to the function’s discretion, is make another system call, telling the kernel that you’re done with that page. The kernel then marks this page read-only in your address space, and if any other function up the chain is waiting on your data would then get unblocked and start working on your output.

Thus, all functions are lazy, and referentially transparent. Things are good. I’m hoping to have a typing system, so the user would be prevented from hooking up a function that outputs strings to a function which inputs MP3 streams. So if we add some “special” OS-provided functions for reading and writing files, some multimedia, etc., then we have a cool toy OS at this point. The user can hook functions together in crazy ways and do fun things like store files of infinite size on their filesystem, not unlike named pipes on Unix.

Two differences in this between using named pipes and piping processes together on Unix is that the functions are:

  1. functions are prohibited from doing anything secretive to maintain state;
  2. rather than doing everything through strings—the “everything is a bunch of bytes” philosophy of Unix—data comes in as actual data structures, so no parsing is needed. There is a central repository for data structures with hopefully much support for programming languages. Right now I’m only looking at C and Haskell, with almost everything happening in C right now.

So the obvious question is how does the user interact with it, and how does one write applications, if side effects are prohibited. Some “special” OS functions for general desktop use would be provided. And a lot of UI stuff will be done without side effects.

I like the example of a web browser. Let’s say you’re writing a web browser. There will be a special OS-supplied function which will take a URL, and possible some cookie state, and return to you an HTML document, and some updated cookie state. Another function will take an HTML document as input, and maybe some CSS data and whatnot, and return to you a rendered document. This rendered document can be displayed by the UI, but clicking on the links won’t do anything—that would require side effects.

So there are actually two kinds of processes. Well three if you consider “special” OS-provided processes. There are function processes, which I’ve described, which are prohibited from making usual system calls. Then there will be application processes, which can make fun system calls like drawing to the screen and reading keyboard input. The only point of an application process is to glue together function processes. Thus, in the the web browser example, the application process gets a rendered document. It will provide hooks into that document, such that when you click on a link stuff will actually happen.

I’m contemplating prohibiting application processes from being native code, in an attempt to force more work to be done in functions. I.e., application processes must be interpreted in some scripting language. I haven’t decided if that’s too Nazi like, but I like the anti-DRM, anti-RealPlayer, anti-bundled Spyware aspect of forcing code with side-effects to be in plain source.

Other benefits of an OS model like this is that you, as a user, can pause applications and break them apart to see what intermediate data values are passing back and forth. Copy them, manipulate them, save them to disk.

Also, GUI level scripting is possible. Let’s say there’s some common task you do with data, taking images off your digital camera, renaming them, changing the colour balance, resizing them, etc., and you want to script this. You tell the operating system “okay give me a hypothetical set of images and I’ll show you what I do with them”. In a sense the OS is generating a lambda abstraction for you.

Also, OS-level spreadsheet like functions can be done. While procedurally generated data is transparent to the processes, it’s not transparent to the OS. So if a file is tagged to the OS as being a result of a computation, and the operands to that computation change—say because the user changed an input file—and that file and all data dependent on that file update automatically. Since windows displayed to the user are, at their core, data structures, this means that it would update some resultant GUIs too possibly.

Advertisements