Multithreaded Prexonite

Yesterday, I watched the Google TechTalk presentation of the programming language Go (see golang.org). Even though I don’t believe Go will become very popular, there was one aspect of the language that “inspired” me. One of Go’s major advantages is the fact that concurrency (and coordination thereof) is built deep into the language. Every function has the potential to become what they call a “goroutine” that works in parallel to all the other code. Communication and coordination is achieved through an equally simple system of synchronous channels (a buffer of size 0).

The code samples all looked so incredibly simple to me that I thought: I can do that too. I started by writing a case study (a code sample) and a sketched implementation of the synchronization mechanisms. It all looked nice on paper (in the Prexonite Script editor) but there was one major problem: None of the Prexonite execution engines is really capable of running on multiple threads. Even though the CIL compiled functions behave nicely, they don’t cover the whole language. Coroutines and other dynamic features are just too important to be ignored.

Fortunately the design of the interpreter is centered around one abstract class: the StackContext. StackContexts come in a variety of forms and provide complete access to the current state of the interpeter. For the most part, these objects are passed around in the call hierarchy. And that’s good, because functions/methods that call each other are in the same thread. So stack contexts stay thread local. At least the ones obtained via method arguments. But there is also another way to get your hands onto stack contexts and that’s the interpreters stack itself.

Of course a multithreaded Prexonite will have multiple stacks but just providing those isn’t enough. I had to ensure that direct queries to/manipulation of the stack are only directed at the stack of the currently executing thread. Since stack contexts are sometimes “captured” by other objects (coroutines for instance), they can on occasion slip into other threads and wreak havoc with the stack of their original thread.

The only solution I saw, was to use thread local storage via unnamed data slots. This has disadvantages, of course:

  • data slots are rather slow (not Reflection slow, but also not virtual call fast)
  • it is difficult to manipulate a different stack if you want to

but

  • it works!
//name~String, source~Channel
function work(name, source)
{
    println("\tWhat did $name say again?");
    var a = source.receive;
    println("\t$name said \"$(a)\"");
    return name: a;
}

function main()
{
    var source = chan; //Creates a new channel
    println("Get to Work!");
    //`call\async` invokes `work` in the backgrouund
    //`resp` is a channel that will receive the return value of `work`
    var resp = call\async(->work,["Chris",source ]);

    println("Oh, I forgot to tell you this: Chris said Hello!");
    source .send("Hello!"); //supply the missing value

    println("I'm waiting for you to finish...");
    var res = resp.receive;
    println("So your answer is $res then...");

    return 0;
}

results (sometimes) in

Get to Work!
        What did Chris say again?
Oh, I forgot to tell you this: Chris said Hello!
I'm waiting for you to finish...
        Chris said "Hello!"
So your answer is Chris: Hello! then...

Lists, coroutines and all other kinds of sequences appear very often in Prexonite. It is only logical to combine multithreading with the IEnumerable interface. The result is the `async_seq` command that takes an existing sequence and pushes its computation into a background thread.

The following example applies the two fantasy-functions `sumtorial` and `divtorial` to the numbers 1 through 100. The ordinary `seq` version uses can only use one processor core, while `async_seq` pushes the computation of `sumtorial` into the background and onto the second core. Even though the management overhead is gigantic, a performance improvement can be measured (a modest factor of ~X1.3).

function sumtorial(n) =
    if(n <= 1) 1
    else       n+sumtorial(n-1);
function divtorial(n) =
    if(n <= 1) 1
    else       n/divtorial(n-n/38-1);

function main()
{
    var n = 100;

    var seq =
        1.To(n)
        >> map(->sumtorial)
        >> map(->divtorial);

    println("Sequential");
    println("\t",seq >> sum);

    var par =
        1.To(n)
        >> map(->sumtorial)
        >> async_seq
        >> map(->divtorial);

    println("Parallel");
    println("\t",par >> sum);
}

Multithreaded Prexonite currently lives in its own SVN branch as I am still experimenting with concrete implementations. A very nice improvement would be the move away from CLR threads as they don’t scale well. Communicating sequential processes spend a lot of time waiting for messages, something that does not require its own thread. Go can use 100′000 goroutines and more. I don’t even want to try this with CLR threads…

Lazy factorial

This post refers to Parvum, a research compiler/language of mine presented in the previous Post. Its a compiler written in Haskell, that translates a “lambda calculus”-like language to Prexonite byte code assembler.

Did you know that a tiny modification to Parvum makes the language (most likely) turing-complete? By adopting non-strict (lazy) evaluation, one can implement all sorts of control structures in nothing but Parvum, even without language support for boolean values and conditions.

Proof? Yes, please! Here you see the implementation of the factorial function in Parvum (currently hardwired to compute the factorial of 8):

(\bind.
  bind (\n. n (\x. x + 1) 0) \toInt.
  bind (\f x. x) \zero.
  bind (\f x. f x) \one.
  bind (\f x. f (f (f (f x)))) \four.
  bind (\n f x. f (n f x)) \succ.
  bind (\p a b. p a b) \ifthenelse.
  bind (\x y. x) \true.
  bind zero \false.
  bind (\n. n (\x. false) true) \isZero.
  bind (\n m. m succ n) \plus.
  bind (\n. n (\g k. isZero (g one) k (plus (g k) one) ) (\v. zero) zero) \pred.
  bind (\m n f. m (n f)) \mul .
  bind (\self n.
    ifthenelse (isZero n)
      (one)
      (mul n (self self (pred n)))
    ) \facrec.
  bind (\n. facrec facrec n) \fac.

  toInt (fac (succ (succ (succ (succ (succ four))))))
)(\var body. body var)

76 seconds and a peak working set of over 650 MiB later, the number 362880 appears on the console. It works indeed.

I spare you the compiled byte code as it consists of 47 functions. Note that Prexonite integer arithmetic is strictly only used in the lambda expression bound to `toInt`, which is applied after the factorial has been computed.

The actual computation is all done in church numerals, so yes, the number 362880 is indeed a function `c` that applies a supplied function `f` 362880 times. The necessary control structures (`ifthenelse`) are entirely implemented using functions too.

Recursion was a bit tricky as Parvum does not (yet) have let-bindings. As you can see in the code, I solved this by passing `facrec` a reference to itself. It does look a bit strange, especially the `self self` bit, but it works.

The price paid

I admit, I cheated (a little): The laziness mechanisms are actually implemented in C# as part of Prexonite (live in the SVN trunk). There is a new command called `thunk` which takes an expression (something callable, a lambda expression for instance) and an arbitrary number of parameters for that expression (optional). The return value is, surprise, a `Thunk` object. This object really has only one member: `force`

`force`, well, forces the evaluation of the supplied expression until some concrete value is obtained. That value, can of course contain further thunks. A lazy linked list for instance:

function _consT hT tT = [hT,tT];
function _headT xsT = xsT.force[0];
function _tailT xsT = xsT.force[1];
function _refT xT = xT.force.();

function repeatT(x)
{
  var xsT;
  var xsT = thunk(->_consT, x, thunk(->_refT, ->xsT));
  return xsT;
}

//Equivalent in Haskell:
//repeat x = let xs = x:xs in xs

Identifiers ending in `T` represent thunks (or functions that take and return thunks) whereas a prepended `_` identifies what I call "primitve expressions". They are the building blocks for more complex expressions and most are equivalents of actual assembler instructions: `_refT` for instance is the primitive for `indarg.0`, the instruction responsible for dereferencing variable references (among other things).

Justification

In the last post I justified the decision to compile to Prexonite byte code assembler with the lack of challenge when compiling into an actual high level language like Neko or JavaScript (or Haskell).

Why is it that I suddenly fear challenge? First of all, the laziness implementation in effect right now could also have been implemented in Prexonite script (in fact it has already been implemented: `create_lazy` in psr\misc.pxs is an example. (I will probably remove this structure now that I have a fully managed implementation. )

Actually, `thunk` was more difficult to write in C# than it would have been in Prexonite Script. As I mentioned, the factorial program used over 650 MiB of RAM and even though all Prexonite objects are stored on the Heap, the stack frame overhead is gigantic. I had to add a new mechanism to the FunctionContext (the actual byte code interpreter) to allow the injection of managed into the Prexonite stack.

That mechanism (the co-operative managed context) is similar to managed coroutines but behaves more like a function. Also managed code that was initially not intended to be run co-operatively (i.e. invoked as a member or as an indirect call) can “promote” itself onto the Prexonite stack.

Of course the Prexonite stack is not exactly known for its excellent performance, but unlike the managed stack, it is only limited by the amount of memory available.

Meta Programming in Prexonite Script?

The idea of implementing a macro system in Prexonite Script was inspired by reading an article on lambda-the-ultimate.org about Converge. I realized that with a powerful API like that of the Prexonite compiler, it might be possible to implement similar features.

The basic idea would be to have a new kind of function, the macro function, which is executed at compile time whenever it is encountered in code. Instead of evaluated arguments, it receives the AST nodes of its arguments. These can then be used to dynamically assembly a resulting AST node that is returned to compiler. That node is then inserted instead of an ordinary call to the macro function.

A further extension of this mechanism would be a feedback interface for the declaration of macro (and possibly other) functions. It would be possible to declare an argument to be something other than an expression. The implementation control structures would be one obvious use.

Apart from the invocation of macro functions and later and insertion of the resulting AST node, there are other requirements to a useful macro system: The AST manipulation API. Working with the AST in Prexonite can be painful at times as it is geared towards statically compiled code. Not only is it a very complex but also a very irregular API. It was never meant to be exposed in such a direct fashion. Moreover, it is not trivial to refactor the API as it is used from in generated code, which means that the usual refactoring tools will only find a small subset of all uses in the system.

The ast and SI features of the psr\ast.pxs library are a start but far from providing a convenient way of creating new AST nodes. Both projects mentioned in the article about meta programming employ quotation and splicing.

Quotes

An expression that represents the AST of itself. (Probably the worst definition ever).

Example for expression quote

var node = (. x.member(5) + 6 .);
//  equivalent to something like
var node = ast("BinaryOperator", BinaryOperator.Addition,
    ast("GetSetMember",ast("GetSetSymbol","x"),"member",[ast("Constant",5)]),
    ast("Constant",6)); //Simplified

Example for statement quote:

var node =
{.
  while(x < 5)
    println(x++);
.};
//   equivalent to something like:
var node = ast("WhileLoop",
  ast("BinaryOperator",BinaryOperator.LessThan,
    ast("GetSetSymbol","x"), ast("Constant", 5)),
     ast("Block", [
       ast("GetSetSymbol","println",[
         ast("UnaryOperator",UnaryOperator.PostIncrement,
           ast("GetSetSymbol","x"))
       ]);
]));

One easily sees that constructing AST nodes procedurally is no fun. The actual API is even more complicated. With a whole army of helper and wrapper functions, the most common constructs could be simplified but nothing beats expressing the code fragment you want with that same fragment. Quotes, however, are practically useless as they are static in nature. Enter

Splicing

Insertion of dynamically generated AST nodes into static ones.

Splicing would come in two forms with my approach. First of all, splicing happens when “calling” a macro function in normal code. The resulting AST node is spliced into the normal code in place of the macro call.

The second form is splicing into quotes. In a way similar to string inter- polation.

Example:

macro my_foreach(iterator, seq, block)
{
  var node =
  {.
    foreach(.(iterator). in .(seq).)
     .{ block }.
  .};
  return node;
}

The above example happens to uncover the challenge of delaying the effective construction of the foreach loop. The block passed in via the macros arguments might employ structured gotos (`break` and `continue`) which are normally resolved or at least linked when the AST is constructed. But since not even the logical target is known until the block is spliced into the quote, the task of linking the control flow of the block and the loop together might be non-trivial.

Hygiene is another topic. The compiler must generate unique names for all symbols used literally inside a quote. Otherwise the macro might accidentally use unrelated variables inside the calling function. This, however, has sever implications on either usability or complexity of quotes as the following example demonstrates:

macro for_upto(sym, max, action)
{
  var i_var = (. var i; .); //Does not work. Variables are not
  //  declared by AST nodes

  var i = create_variable("i");

  return
  {.
    for(.(i). = 0; .(i). < .(max). ; .(i).++)
      .{ action }.
  .};
}

Whereas create_variable would be some sort of helper function that generates a unique (shadow) variable within the target returns a reference to the corresponding node. One might come up with a slightly friendlier notation for splicing expressions. Something like $$i maybe.

And then there is the issue of AST node not being designed to be used in multiple places (they are not immutable). Therefor the splicing implementation would have to insert copies of the nodes inserted. Not to mention that the AST doesn’t implement copying naturally.

Conclusion

Implementing a meta programming system on top of Prexonite script similar to that of Converge is not realistic. One has to design both the language and the compiler with such a goal in mind. Adapting the existing compiler would most definitely result in a mess.

I, however, think that the implementation of macro functions is feasible and useful. Macros won’t revolutionise how the language is used, but will complement the existing dynamic features to enable concise solutions, should the programmer decide to invest enough time into the macro system.

Shadow ids in Prexonite Script

It’s time for a new release of the Prexonite Script language:

New this time around is the very small but useful addition of shadow ids, a feature not entirely unlike  the concept of shadowing in C#, thus the name.

Imagine you are working with two comparatively complex libraries, say a convenient wrapper around the Prexonite compiler and some sort of templating engine. If the respective authors of those two libraries both decide to use the identifier `load` for one of their functions, you end up in trouble.

Even though it is absolutely possible to define different symbols (i.e., names) to differentiate between those functions, Prexonite’s architecture does not allow two functions with the same physical name inside one application. And since the physical id happens to be the id the function has been defined with, a collision is inevitable.

Enter shadow ids

Have a look at the following example:

function as load, compiler_load (path, loader)
{
  if(loader is Null)
  {
    var engine = asm(ldr.eng);
    var app = asm(ldr.app);
    loader = new Prexonite::Compiler::Loader
      (engine, app);
  }
  loader.LoadFromFile(path);
}

This function is a wrapper around the Prexonite.Compiler.Loader.LoadFromFile method in some fictional compiler library file. Notice the keyword as in front of the list of names for that function. It means that the function itself is anonymous, i.e. it’s actual name is generated by the compiler. To access the function, you may use any of the alternative names specified.

In addition to that, a generalization for shadow ids is provided for easy function alias definition:

function compiler_load as load (path, loader)
{
  …
}

…which defines an ordinary function compiler_load and immediately adds an alias load for convenient access. This is really just syntactic sugar for

function compiler_load (path, loader)
{
  …
}
declare compiler_load as load;

Limitations

  • Shadow ids do not provide obfuscation, they are formed by appending a unique string to the first id in the alias list.So function as f, g(){} will be named something like f\gene05c25a4869140c599e4982665b48417. Also, f will be persisted in meta data as the functions LogicalId. This, however is an implementation detail and must not be relied upon.
  • There is currently no equivalent for global variable definitions, although such a feature is planned.

Future of Prexonite Script

Bad news first: Prexonite and Prexonite Script are both as mature as they will get. The architecture has its limits and expanding it further to improve the language/runtime is unreasonably complicated.

Prexonite Script will, however, remain my favourite sandbox for experimenting and improving my language design skills. At least until a successor is written. There are two particular areas the next releases will focus on:

Debug support

The Prexonite Script architecture was never built with debugging in mind. Although line/column/file information is (somewhat) propagated to the AST, it is not used thereafter. There is no mechanism to map instructions to code lines (or vice-versa). Nontheless I am currently prototyping a simple, byte code based, command line debugger.

It will probably come in the form of a library with a special breakpoint function, that invokes an environment not unlike the interactive mode for Prx.exe. It supports inspection of the stack, local variables, watch expressions, stepping (through byte code, not statements) as well as step into functionality.

Have a look at this example session:

Local variables of main
 var y = 2;
 var x = 1;

Stack of context of function main
 1: 
 2: 
code = function main
[
LogicalId main;import False;
\symbol_mapping {0 -> 0,
1 -> 1};
]
 does asm {
var y,x
   00: @func.0 debug_break
        ldc.int 1
        stloc x
        ldloc x
        ldc.int 1
        add
        stloc y
        ldloc y
        ldloc x
-> 09:  cgt
        jump.f 14
        ldloc x
       @cmd.1 println
        jump 16
   14:  ldloc y
       @cmd.1 println
   16:  ldc.string "Bye!"
        cmd.1 println
   18:  ret.value
   19: }

main@9>

Notice how references to local variables are shown by-name even though the underlying bytecode actually uses the faster by-index opcodes. At the top you see the list of local variables, followed by a display of the current stack. Above the prompt is an enhanced view of the bytecode with only important addresses marked (beginning, end, current and jumpt targets).

What is missing right now, is step-into. Because the debugger is invoked on a per stack context basis (aka function activation). Enabling the debugger for calls made from the current context requires intercepting all corresponding calls. The task is non-trivial because closure, coroutine and structure method calls are statically indistinguishable from calls to aribrary implementations of IIndirectCall and friends.

Macro system

Another area of development is the addition of a more user friendly macro system into Prexonite Script. Currently, one can rewrite undefined ids, a mechanism that is not supposed to be a replacement for macros but rather a means of changing the default handling of unknown ids.

Alternatively, compiler hooks can be used to transform entire functions. Not only is it very tedious to find the specific calls to replace, it’s also not guaranteed that you find every instance of a certain expression by traversing the tree. All uses of compiler hooks share a similar pattern of searching a certain dummy function reference and replacing it with custom nodes, using the transformed function itself and any of its parameters as arguments.

A possible simplification of this scheme would be the explicit support for macro functions:

macro square(x)
{
  //Define a
  var tmp = define("tmp");
  var assignment = ast("GetSetSymbol",SetCall,tmp,[x]);
  var computation = multiply(tmp,tmp);
  return block([assignment, computation]);
}

The macro square transforms calls to it like ( square(x+2) ) to an expression like ( let tmp = x+2 in (tmp)*(tmp) ). The details are not fleshed out yet. A more concrete discussion follows.

Operator overloading for Prexonite, more or less

Prexonite October 2008

While Prexonite Script does not formally provide any means of object-oriented programming (not just consuming objects but actually defining new ones) there are mechanisms in Prexonite (the runtime) that try to make up for this.

What Prexonite includes so far

  • The Structure data type
    • groups values and functions together
    • provides "natural" access via dot-notation

    Prexonite Script:
    1. function create_complex(_re, _im)
    2. {   
    3.     _re ??= 0.0;
    4.     _im ??= 0.0;
    5.     var z = new Structure;
    6.    
    7.     z.\("re") = _re;
    8.     z.\("im") = _im;
    9.    
    10.     z.\\("norm") = (this) =>
    11.         sqrt(this.re^2 + this.im^2);
    12.     z.\\("add") = (this,other) =>
    13.         create_complex(this.re + other.re,this.im + other.im);
    14.     z.\\("ToString") = (this) =>
    15.         this.re + " " + this.im + "i";
    16.    
    17.     return z;
    18. }

  • The ExtendableObject base class
    • Makes it possible for Prexonite code to "extend" .NET objects, should the developer want to allow this.
  • The struct function (in psr/struct.pxs )
    • automates construction of Structure objects via reflection

    Prexonite Script:
    1. build does require("psr\\struct.pxs");
    2.  
    3. function create_complex(_re, _im)
    4. {   
    5.     _re ??= 0.0;
    6.     _im ??= 0.0;
    7.    
    8.     function re(this, new_re)
    9.     {
    10.         if(new_re is not Null)
    11.             _re = new_re;
    12.         return _re;   
    13.     }
    14.    
    15.     function im(this, new_im)
    16.     {
    17.         if(new_im is not Null)
    18.             _im = new_im;
    19.         return _im;   
    20.     }
    21.    
    22.     function norm = sqrt(_re^2 + _im^2);
    23.    
    24.     function add(this, other) =
    25.         create_complex(re + other.re,im + other.im);
    26.    
    27.     function ToString =
    28.         re + " " + im + "i";
    29.    
    30.     return struct;
    31. }

New in this release1

  • auto-property, proxy-property, property support via psr/prop.pxs
    Prexonite Script:
    1. build does require("psr\\struct.pxs","psr\\prop.pxs");
    2.  
    3. function create_car(a_color)
    4. {     
    5.     _re ??= 0.0;
    6.     _im ??= 0.0;
    7.    
    8.     function re = struct_prop(_re);
    9.    
    10.     function im = struct_prop(_im);
    11.    
    12.     function norm = sqrt(re^2 + im^2);
    13.    
    14.     function add(this, other) =
    15.         create_complex(re + other.re,im + other.im);
    16.    
    17.     function ToString =
    18.         re + " " + im + "i";
    19.    
    20.     return struct;
    21. }

  • operator overloading via (+), (==) etc.
    Prexonite Script:
    1. //...
    2. function (+) this other =
    3.     create_complex(re + other.re,im + other.im);
    4.  
    5. function (-.) =
    6.     create_complex(-re, -im);
    7.    
    8. function (-) this other = this + (-other);
    9.    
    10. //...

1psr/prop.pxs is in the repository since August.

Let's discuss those two extensions one by one…

Properties

First of all, there still is no language support for structures and properties. As of right now, they are implemented via compile time transformations. These two transformations can be enabled via importing the Prexonite Standard Repository scripts psr/struct.pxs and psr/prop.pxs. While the former has been around for some time now, the latter has only recently been implemented in managed code.

psr/prop.pxs defines the special function prop which is expanded into get and set code depending on the way it is used. The easiest mode of operation implements a property with an anonymous backing field (defined in the surrounding scope, i.e. as a global variable in a top-level function or a shared variable in the outer function) The second mode mimics a simple get-set proxy where the expression passed as an argument is used as the backing field. This works for arbitrary expressions as long as they can be the target of an assignment (i.e. they are get-set expressions). The third mode finally is a replication of C# properties taking separate lambda expressions for its get and set implementations.

struct_prop works the same way but ignores the additional this parameter for structure methods.

Operator overloading

When resolving operator calls, generic objects (as well as structures) now also try to call special instance methods after failing to find a corresponding static operator overload. This makes the implementation of operator overloads for structures and ExtendableObjects (An abstract class that makes an object extendable in the same way a structure can be extended) possible. In order to avoid strange operator names, I have introduced operator ids, special literals (on the lexical level) that represent operator names. There is nothing special about these literals, they can be used everywhere an id can be used. You could for instance define a local variable with the name (+).