Multithreaded Prexonite

Yesterday, I watched the Google TechTalk presentation of the programming language Go (see golang.org). Even though I don’t believe Go will become very popular, there was one aspect of the language that “inspired” me. One of Go’s major advantages is the fact that concurrency (and coordination thereof) is built deep into the language. Every function has the potential to become what they call a “goroutine” that works in parallel to all the other code. Communication and coordination is achieved through an equally simple system of synchronous channels (a buffer of size 0).

The code samples all looked so incredibly simple to me that I thought: I can do that too. I started by writing a case study (a code sample) and a sketched implementation of the synchronization mechanisms. It all looked nice on paper (in the Prexonite Script editor) but there was one major problem: None of the Prexonite execution engines is really capable of running on multiple threads. Even though the CIL compiled functions behave nicely, they don’t cover the whole language. Coroutines and other dynamic features are just too important to be ignored.

Fortunately the design of the interpreter is centered around one abstract class: the StackContext. StackContexts come in a variety of forms and provide complete access to the current state of the interpeter. For the most part, these objects are passed around in the call hierarchy. And that’s good, because functions/methods that call each other are in the same thread. So stack contexts stay thread local. At least the ones obtained via method arguments. But there is also another way to get your hands onto stack contexts and that’s the interpreters stack itself.

Of course a multithreaded Prexonite will have multiple stacks but just providing those isn’t enough. I had to ensure that direct queries to/manipulation of the stack are only directed at the stack of the currently executing thread. Since stack contexts are sometimes “captured” by other objects (coroutines for instance), they can on occasion slip into other threads and wreak havoc with the stack of their original thread.

The only solution I saw, was to use thread local storage via unnamed data slots. This has disadvantages, of course:

  • data slots are rather slow (not Reflection slow, but also not virtual call fast)
  • it is difficult to manipulate a different stack if you want to

but

  • it works!
//name~String, source~Channel
function work(name, source)
{
    println("\tWhat did $name say again?");
    var a = source.receive;
    println("\t$name said \"$(a)\"");
    return name: a;
}

function main()
{
    var source = chan; //Creates a new channel
    println("Get to Work!");
    //`call\async` invokes `work` in the backgrouund
    //`resp` is a channel that will receive the return value of `work`
    var resp = call\async(->work,["Chris",source ]);

    println("Oh, I forgot to tell you this: Chris said Hello!");
    source .send("Hello!"); //supply the missing value

    println("I'm waiting for you to finish...");
    var res = resp.receive;
    println("So your answer is $res then...");

    return 0;
}

results (sometimes) in

Get to Work!
        What did Chris say again?
Oh, I forgot to tell you this: Chris said Hello!
I'm waiting for you to finish...
        Chris said "Hello!"
So your answer is Chris: Hello! then...

Lists, coroutines and all other kinds of sequences appear very often in Prexonite. It is only logical to combine multithreading with the IEnumerable interface. The result is the `async_seq` command that takes an existing sequence and pushes its computation into a background thread.

The following example applies the two fantasy-functions `sumtorial` and `divtorial` to the numbers 1 through 100. The ordinary `seq` version uses can only use one processor core, while `async_seq` pushes the computation of `sumtorial` into the background and onto the second core. Even though the management overhead is gigantic, a performance improvement can be measured (a modest factor of ~X1.3).

function sumtorial(n) =
    if(n <= 1) 1
    else       n+sumtorial(n-1);
function divtorial(n) =
    if(n <= 1) 1
    else       n/divtorial(n-n/38-1);

function main()
{
    var n = 100;

    var seq =
        1.To(n)
        >> map(->sumtorial)
        >> map(->divtorial);

    println("Sequential");
    println("\t",seq >> sum);

    var par =
        1.To(n)
        >> map(->sumtorial)
        >> async_seq
        >> map(->divtorial);

    println("Parallel");
    println("\t",par >> sum);
}

Multithreaded Prexonite currently lives in its own SVN branch as I am still experimenting with concrete implementations. A very nice improvement would be the move away from CLR threads as they don’t scale well. Communicating sequential processes spend a lot of time waiting for messages, something that does not require its own thread. Go can use 100′000 goroutines and more. I don’t even want to try this with CLR threads…

Lazy factorial

This post refers to Parvum, a research compiler/language of mine presented in the previous Post. Its a compiler written in Haskell, that translates a “lambda calculus”-like language to Prexonite byte code assembler.

Did you know that a tiny modification to Parvum makes the language (most likely) turing-complete? By adopting non-strict (lazy) evaluation, one can implement all sorts of control structures in nothing but Parvum, even without language support for boolean values and conditions.

Proof? Yes, please! Here you see the implementation of the factorial function in Parvum (currently hardwired to compute the factorial of 8):

(\bind.
  bind (\n. n (\x. x + 1) 0) \toInt.
  bind (\f x. x) \zero.
  bind (\f x. f x) \one.
  bind (\f x. f (f (f (f x)))) \four.
  bind (\n f x. f (n f x)) \succ.
  bind (\p a b. p a b) \ifthenelse.
  bind (\x y. x) \true.
  bind zero \false.
  bind (\n. n (\x. false) true) \isZero.
  bind (\n m. m succ n) \plus.
  bind (\n. n (\g k. isZero (g one) k (plus (g k) one) ) (\v. zero) zero) \pred.
  bind (\m n f. m (n f)) \mul .
  bind (\self n.
    ifthenelse (isZero n)
      (one)
      (mul n (self self (pred n)))
    ) \facrec.
  bind (\n. facrec facrec n) \fac.

  toInt (fac (succ (succ (succ (succ (succ four))))))
)(\var body. body var)

76 seconds and a peak working set of over 650 MiB later, the number 362880 appears on the console. It works indeed.

I spare you the compiled byte code as it consists of 47 functions. Note that Prexonite integer arithmetic is strictly only used in the lambda expression bound to `toInt`, which is applied after the factorial has been computed.

The actual computation is all done in church numerals, so yes, the number 362880 is indeed a function `c` that applies a supplied function `f` 362880 times. The necessary control structures (`ifthenelse`) are entirely implemented using functions too.

Recursion was a bit tricky as Parvum does not (yet) have let-bindings. As you can see in the code, I solved this by passing `facrec` a reference to itself. It does look a bit strange, especially the `self self` bit, but it works.

The price paid

I admit, I cheated (a little): The laziness mechanisms are actually implemented in C# as part of Prexonite (live in the SVN trunk). There is a new command called `thunk` which takes an expression (something callable, a lambda expression for instance) and an arbitrary number of parameters for that expression (optional). The return value is, surprise, a `Thunk` object. This object really has only one member: `force`

`force`, well, forces the evaluation of the supplied expression until some concrete value is obtained. That value, can of course contain further thunks. A lazy linked list for instance:

function _consT hT tT = [hT,tT];
function _headT xsT = xsT.force[0];
function _tailT xsT = xsT.force[1];
function _refT xT = xT.force.();

function repeatT(x)
{
  var xsT;
  var xsT = thunk(->_consT, x, thunk(->_refT, ->xsT));
  return xsT;
}

//Equivalent in Haskell:
//repeat x = let xs = x:xs in xs

Identifiers ending in `T` represent thunks (or functions that take and return thunks) whereas a prepended `_` identifies what I call "primitve expressions". They are the building blocks for more complex expressions and most are equivalents of actual assembler instructions: `_refT` for instance is the primitive for `indarg.0`, the instruction responsible for dereferencing variable references (among other things).

Justification

In the last post I justified the decision to compile to Prexonite byte code assembler with the lack of challenge when compiling into an actual high level language like Neko or JavaScript (or Haskell).

Why is it that I suddenly fear challenge? First of all, the laziness implementation in effect right now could also have been implemented in Prexonite script (in fact it has already been implemented: `create_lazy` in psr\misc.pxs is an example. (I will probably remove this structure now that I have a fully managed implementation. )

Actually, `thunk` was more difficult to write in C# than it would have been in Prexonite Script. As I mentioned, the factorial program used over 650 MiB of RAM and even though all Prexonite objects are stored on the Heap, the stack frame overhead is gigantic. I had to add a new mechanism to the FunctionContext (the actual byte code interpreter) to allow the injection of managed into the Prexonite stack.

That mechanism (the co-operative managed context) is similar to managed coroutines but behaves more like a function. Also managed code that was initially not intended to be run co-operatively (i.e. invoked as a member or as an indirect call) can “promote” itself onto the Prexonite stack.

Of course the Prexonite stack is not exactly known for its excellent performance, but unlike the managed stack, it is only limited by the amount of memory available.

A truly insignificant post

I’ve been toying around with Haskell lately, after I bought the excellent book Real World Haskell (there is also an online version where readers can comment on every paragraph). Every programmer should enjoy programming with a rigid but powerful type system like Haskell’s once in his or her career. It really puts things into perspective. Sure Ruby and Python are cool for quick hacking but nothing beats knowing that GHC accepts a program.

Indeed, I do catch most of my coding mistakes at compile time. Of course nothing prevents you from confusing (1-x) with (x-1) but Haskell really lets you focus on the actual code (once you get your program past the type checker).

But this post isn’t about me being a new fan of Haskell, but something much more insignificant: Yesterday I’ve built a tiny compiler. For a tiny language.

Parvum

Its called Parvum (which is Latin for, well, insignificant) and translates expressions in a “lambda calculus”-like language to a corresponding program in Prexonite bytecode assembler:

(\bind.
  bind (\x. x + 5) \lambda.
  bind 2 \calculus.
  lambda calculus
)(\x. \f. f  x)

I’ve taken the freedom to add integer literals and basic arithmetic since they can be expressed in lambda calculus anyway (via church numerals). If you want proof:

(\bind.
  bind (\f. \x. x) \zero.
  bind (\f. \x. f x) \one.
  bind (\f. \x. f (f x)) \two.
  bind (\n. \f. \x. f (n f x)) \succ.
  bind (\n. \m. m succ n) \plus.
  bind (\n. n (\x. x + 1) 0) \toLiteral.
  toLiteral (plus one (succ two))
)
(\val. \body. body val)

Just by using `one` and `succ` you can already express every natural number. Those church numerals can be mapped to any other number system by supplying the successor function and the zero element. In case of the built-in integers, that would be `\x. x + 1` and `0` respectively.

And this example indeed results in “4” just as expected.

I would love to reproduce the compiled assembler code here, but it not pretty. The code consists of over 20 tiny functions, one for each `.` in the code above. And yes there is no shorthand for curried multi-parameter functions. This is an insignificant language after all.

Now why the effort?

Implementing Parvum was an interesting experience in Haskell. I got to use the parser combinator library `parsec` which embeds nicely into Haskell. Working with parsec really is great because you use one environment for the whole compiler. No separate grammar file and no machine generated code files. If I need a convenience function for my grammar (like a parameterised production) I just go ahead and create it.

Unlike all my previous parsers, this one really focuses on building an AST solely. No name lookups, no transformations, just an abstract syntax tree. This tree is then fed into the compiler, which translates it into a representation of the program not entirely unlike what Prexonite uses: There are applications which consist of global variables and functions which in turn have parameters, local variables, shared names (via closures) and byte code. This model of the application is then printed via the HughesPJ pretty printer library.

Apart from the highly ambiguous grammar of lambda calculus notation the biggest problem was to implement the translation step. Not because walking an AST in post-order  is particularly difficult (Prexonite assembler is stack based) but because emitting embedded functions while generating code is not trivial.

  • First of all, you need a steady supply of unique names: state monad
  • You need to keep track of the current function environment (for shared names): reader monad and the local function
  • Finally the embedded functions have to be stored in the application: writer monad

which results in quite a monster of a monad: ReaderT PxsFunc (WriterT [PxsFunc] (State Int))

I’m not really happy with this solution as it is yet another example of how you can duck behind imperative concepts once things get a bit more complicated. In this particular case, the decision to design the application model (application contains functions and global variables etc.) like I did in C# is probably the culprint. In Prexonite, I build these data structures incrementally, adding and tweaking bits here and there. This just isn’t the way to go in Haskell. In Haskell you’re supposed find one final expression for every value, which in this case proved to be far from trivial.

Why Prexonite?

By now there are really a lot of options for compiler back ends and/or platforms. Ranging from the Java VM, the CLR, Neko and LLVM to old C. Why would Prexonite be just the right choice?

Since Parvum is just a prototype I needed a high level back end which basically reduced the choice down to

  • Neko
  • Prexonite
  • any actual programming language

In particular, the back end had to support closures out of the box. Out of these I chose Prexonite bytecode assembler because it is high level but still requires a translation effort (I actually want to learn something implementing Parvum) and because it resembles CIL to some extent.

The latter is important because I might implement my next serious language on top of the CLR directly instead of the DLR.

The future

No, Parvum is not intended to replace Prexonite as my workhorse  language of choice. It really is just a research object.

I plan to investigate the following:

  1. Arity checking
    • (1 + 5) :: *
    • (\x. x + 1) :: * –> *
    • (\x. \y. x + y) 5 :: * –> *
    • etc.
  2. Extend language with 2nd data structure: Bool
    • Allows for predicates (comparison, and, not etc.)
    • Conditions
    • Makes finite recursion possible
  3. Simple type checker (Int vs. Bool) in addition to arity checking
    • My first static type checker *ever*

Unfortunately Parvum is strict, at least until I’ve found a way to encode lazy computations in the strict Prexonite VM. This is one place where I might cheat and just provide a `thunk` mechanism as part of Prexonite (say a command that wraps a computation and its arguments). We’ll see.

ACFactory Day 5: Welcome to Paper World

This is the fifth part of a series documenting the development of ‘ACFactory’ (ACFabrik in German), an application that generates printable character sheets for the Pen & Paper role playing game Arcane Codex (English page).

You might also want to read ACFactory Day 4: Command & Control.

ACFactory prototype showing two almost entirely empty pages.Pixels in WPF are device-independent and set at 96 pixels per inch. This means, that one WPF-pixel corresponds to one device-pixel if and only if the resolution of the device is 96 dpi. Despite sounding complicated, this is a good thing, because WPF pixels have fixed dimensions. One pixel equals 1/96th of an inch, or 3.77952755905512 pixels are equal to one millimetre. (I’m sorry, but who ever invented inches should be slapped). Naturally, it would not be practical to author the talent sheet in WPF-pixels, constantly having to convert back and forth using a calculator or a table (or both: Excel).

ACFactory Zoom controlIt would be nice to let someone else worry about my coordinate system of choice, someone like WPF. Unlike GDI+, WPF does not provide explicit coordinate system transformations, probably for a good reason. What you can do, are layout transforms. The scale transform looks particularly promising. Initialised with my magic number, I could author my sheets in millimetres and WPF converts them into pixels. There is one problem there: ScaleTransform not only transforms the coordinate system, but everything inside the affected control, including font sizes. While in 99 out 100 cases, this is the expected behaviour, in my exact scenario it’s not. Cambria 12pt should render like Cambria 12pt would render on paper. However, these normal-looking 12 points are scaled to giant 45.3354… points by the ScaleTransform.

Maybe the StackOverflow.com community knows an answer. Until then, I created the hopefully temporary markup extension PaperFontSize, that automatically reverses the effects of such a ScaleTransform.

Having everything set up, I can finally start implementing the talent sheet.

ACFactory Day 4: Command & Control

This is the fourth part of a series documenting the development of ‘ACFactory’ (ACFabrik in German), an application that generates printable character sheets for the Pen & Paper role playing game Arcane Codex (English page).

You might also want to read Day 3: Authoring XAML.

Today, I was mostly lost in the vast unknown jungle that WPF is, even after having read “Windows Presentation Foundation: Unleashed by Adam Nathan” (I really recommend it!).

Control

First there was the difference between UserControl and custom control. (No, I am not very familiar with any other UI framework). Whereas UserControls are little more than include on steroids (you get a code behind file), if you *really* want to create something new or abstract over a composition of controls, then creating custom controls is your only option.

But custom controls are just C# (or VB.NET) code files. There is no XAML involved. How can that be the preferred way to author new controls? Remember that WPF controls are supposed to be look-less, platonic ideas of what they represent.
I wanted to create a zoom view control. What are the abstract properties of such a zoom view control?

  1. It has content
  2. It can magnify its content

Number 1 tells us to derive from ContentControl, the type that defines the Content property. Number 2 is a bit trickier. I decided that my control has a ZoomFactor property (type double, 1.0 == 100%) to which a ScaleTransform is bound. Whether or not this works, I am not exactly sure as the control is not working yet.

But how does the control look? Well that’s not the controls concern. A look is provided by the Themes/Generic.xaml resource dictionary, the default fallback in the absence of local definitions and system specific themes. In my case there is going to be a neat little zoom control (combo box + slider) hovering in the top left corner.

Command

To establish communication between the ZoomViewer template and the ZoomViewer control, there is really only one good mechanism: Commands. Commands are yet another abstraction that makes event handling more modular. Controls like buttons, hyperlinks and menu item can be “bound” to a certain command. That command determines whether they are enabled or not and what happens when they are clicked. You could for instance bind the menu item Edit > Paste, a toolbar button and Ctr+V to the Application.Paste command and they would all automatically be activated/deactivated depending on the state of the clipboard.

Even better, the default implementation, RoutedCommands, work just like routed events and bubble up the tree of your XAML interface. You can then define different command bindings at different locations in your UI. The best of all: Via the command target property, you can tell the command routing where to look for command bindings. I could have two buttons, that both invoke the Navigation.Zoom command, but on two different ZoomViewers.

My ZoomViewer does support the Navigation.IncreaseZoom, .DecreaseZoom and .Zoom commands. This is how the default control template can communicate with the ZoomViewer, by invoking those commands.

There is however one thing, I found very irritating: neither the slider nor the combo box implement commands by default. The msdn contains a sample, that shows how to do this. It turns out to come with quite a few things to watch out for:

  • You must differentiate between routed and ordinary commands as only the former can react to command bindings and can be set to originate from different InputElements.
  • You should rather pass a reference to the invoking control than null as the command target.
  • You must be careful with the CanExecuteChanged event handler. It must be correctly unset, when the command is changed/removed.

How well this all turns out, will hoepfully see soon. Development right now is a bit sluggish as I keep switching over to msdn and/or my book for reference, since Visual Studios XAML editor is not very sophisticated, even with basic ReSharper support. This must get much better in VS10. Up until now, I have observed VS08SP1 only crash 3 times (twice due to a recursive binding *blush*) and once with an HRESULT of E_FAIL (whatever that exactly was). But at least I lost no code.

Oh, and why exactly the Microsoft Blend XAML editor does not provide any support is totally beyond me. I mean free (that means it costs nothing) tools provide better code completion than Blend: Kaxaml, the “better XamlPad”. Even though it takes some time to load, I can definitely recommend it.