Meta Programming in Prexonite Script?

The idea of implementing a macro system in Prexonite Script was inspired by reading an article on lambda-the-ultimate.org about Converge. I realized that with a powerful API like that of the Prexonite compiler, it might be possible to implement similar features.

The basic idea would be to have a new kind of function, the macro function, which is executed at compile time whenever it is encountered in code. Instead of evaluated arguments, it receives the AST nodes of its arguments. These can then be used to dynamically assembly a resulting AST node that is returned to compiler. That node is then inserted instead of an ordinary call to the macro function.

A further extension of this mechanism would be a feedback interface for the declaration of macro (and possibly other) functions. It would be possible to declare an argument to be something other than an expression. The implementation control structures would be one obvious use.

Apart from the invocation of macro functions and later and insertion of the resulting AST node, there are other requirements to a useful macro system: The AST manipulation API. Working with the AST in Prexonite can be painful at times as it is geared towards statically compiled code. Not only is it a very complex but also a very irregular API. It was never meant to be exposed in such a direct fashion. Moreover, it is not trivial to refactor the API as it is used from in generated code, which means that the usual refactoring tools will only find a small subset of all uses in the system.

The ast and SI features of the psr\ast.pxs library are a start but far from providing a convenient way of creating new AST nodes. Both projects mentioned in the article about meta programming employ quotation and splicing.

Quotes

An expression that represents the AST of itself. (Probably the worst definition ever).

Example for expression quote

var node = (. x.member(5) + 6 .);
//  equivalent to something like
var node = ast("BinaryOperator", BinaryOperator.Addition,
    ast("GetSetMember",ast("GetSetSymbol","x"),"member",[ast("Constant",5)]),
    ast("Constant",6)); //Simplified

Example for statement quote:

var node =
{.
  while(x < 5)
    println(x++);
.};
//   equivalent to something like:
var node = ast("WhileLoop",
  ast("BinaryOperator",BinaryOperator.LessThan,
    ast("GetSetSymbol","x"), ast("Constant", 5)),
     ast("Block", [
       ast("GetSetSymbol","println",[
         ast("UnaryOperator",UnaryOperator.PostIncrement,
           ast("GetSetSymbol","x"))
       ]);
]));

One easily sees that constructing AST nodes procedurally is no fun. The actual API is even more complicated. With a whole army of helper and wrapper functions, the most common constructs could be simplified but nothing beats expressing the code fragment you want with that same fragment. Quotes, however, are practically useless as they are static in nature. Enter

Splicing

Insertion of dynamically generated AST nodes into static ones.

Splicing would come in two forms with my approach. First of all, splicing happens when “calling” a macro function in normal code. The resulting AST node is spliced into the normal code in place of the macro call.

The second form is splicing into quotes. In a way similar to string inter- polation.

Example:

macro my_foreach(iterator, seq, block)
{
  var node =
  {.
    foreach(.(iterator). in .(seq).)
     .{ block }.
  .};
  return node;
}

The above example happens to uncover the challenge of delaying the effective construction of the foreach loop. The block passed in via the macros arguments might employ structured gotos (`break` and `continue`) which are normally resolved or at least linked when the AST is constructed. But since not even the logical target is known until the block is spliced into the quote, the task of linking the control flow of the block and the loop together might be non-trivial.

Hygiene is another topic. The compiler must generate unique names for all symbols used literally inside a quote. Otherwise the macro might accidentally use unrelated variables inside the calling function. This, however, has sever implications on either usability or complexity of quotes as the following example demonstrates:

macro for_upto(sym, max, action)
{
  var i_var = (. var i; .); //Does not work. Variables are not
  //  declared by AST nodes

  var i = create_variable("i");

  return
  {.
    for(.(i). = 0; .(i). < .(max). ; .(i).++)
      .{ action }.
  .};
}

Whereas create_variable would be some sort of helper function that generates a unique (shadow) variable within the target returns a reference to the corresponding node. One might come up with a slightly friendlier notation for splicing expressions. Something like $$i maybe.

And then there is the issue of AST node not being designed to be used in multiple places (they are not immutable). Therefor the splicing implementation would have to insert copies of the nodes inserted. Not to mention that the AST doesn’t implement copying naturally.

Conclusion

Implementing a meta programming system on top of Prexonite script similar to that of Converge is not realistic. One has to design both the language and the compiler with such a goal in mind. Adapting the existing compiler would most definitely result in a mess.

I, however, think that the implementation of macro functions is feasible and useful. Macros won’t revolutionise how the language is used, but will complement the existing dynamic features to enable concise solutions, should the programmer decide to invest enough time into the macro system.

Shadow ids in Prexonite Script

It’s time for a new release of the Prexonite Script language:

New this time around is the very small but useful addition of shadow ids, a feature not entirely unlike  the concept of shadowing in C#, thus the name.

Imagine you are working with two comparatively complex libraries, say a convenient wrapper around the Prexonite compiler and some sort of templating engine. If the respective authors of those two libraries both decide to use the identifier `load` for one of their functions, you end up in trouble.

Even though it is absolutely possible to define different symbols (i.e., names) to differentiate between those functions, Prexonite’s architecture does not allow two functions with the same physical name inside one application. And since the physical id happens to be the id the function has been defined with, a collision is inevitable.

Enter shadow ids

Have a look at the following example:

function as load, compiler_load (path, loader)
{
  if(loader is Null)
  {
    var engine = asm(ldr.eng);
    var app = asm(ldr.app);
    loader = new Prexonite::Compiler::Loader
      (engine, app);
  }
  loader.LoadFromFile(path);
}

This function is a wrapper around the Prexonite.Compiler.Loader.LoadFromFile method in some fictional compiler library file. Notice the keyword as in front of the list of names for that function. It means that the function itself is anonymous, i.e. it’s actual name is generated by the compiler. To access the function, you may use any of the alternative names specified.

In addition to that, a generalization for shadow ids is provided for easy function alias definition:

function compiler_load as load (path, loader)
{
  …
}

…which defines an ordinary function compiler_load and immediately adds an alias load for convenient access. This is really just syntactic sugar for

function compiler_load (path, loader)
{
  …
}
declare compiler_load as load;

Limitations

  • Shadow ids do not provide obfuscation, they are formed by appending a unique string to the first id in the alias list.So function as f, g(){} will be named something like f\gene05c25a4869140c599e4982665b48417. Also, f will be persisted in meta data as the functions LogicalId. This, however is an implementation detail and must not be relied upon.
  • There is currently no equivalent for global variable definitions, although such a feature is planned.

Future of Prexonite Script

Bad news first: Prexonite and Prexonite Script are both as mature as they will get. The architecture has its limits and expanding it further to improve the language/runtime is unreasonably complicated.

Prexonite Script will, however, remain my favourite sandbox for experimenting and improving my language design skills. At least until a successor is written. There are two particular areas the next releases will focus on:

Debug support

The Prexonite Script architecture was never built with debugging in mind. Although line/column/file information is (somewhat) propagated to the AST, it is not used thereafter. There is no mechanism to map instructions to code lines (or vice-versa). Nontheless I am currently prototyping a simple, byte code based, command line debugger.

It will probably come in the form of a library with a special breakpoint function, that invokes an environment not unlike the interactive mode for Prx.exe. It supports inspection of the stack, local variables, watch expressions, stepping (through byte code, not statements) as well as step into functionality.

Have a look at this example session:

Local variables of main
 var y = 2;
 var x = 1;

Stack of context of function main
 1: 
 2: 
code = function main
[
LogicalId main;import False;
\symbol_mapping {0 -> 0,
1 -> 1};
]
 does asm {
var y,x
   00: @func.0 debug_break
        ldc.int 1
        stloc x
        ldloc x
        ldc.int 1
        add
        stloc y
        ldloc y
        ldloc x
-> 09:  cgt
        jump.f 14
        ldloc x
       @cmd.1 println
        jump 16
   14:  ldloc y
       @cmd.1 println
   16:  ldc.string "Bye!"
        cmd.1 println
   18:  ret.value
   19: }

main@9>

Notice how references to local variables are shown by-name even though the underlying bytecode actually uses the faster by-index opcodes. At the top you see the list of local variables, followed by a display of the current stack. Above the prompt is an enhanced view of the bytecode with only important addresses marked (beginning, end, current and jumpt targets).

What is missing right now, is step-into. Because the debugger is invoked on a per stack context basis (aka function activation). Enabling the debugger for calls made from the current context requires intercepting all corresponding calls. The task is non-trivial because closure, coroutine and structure method calls are statically indistinguishable from calls to aribrary implementations of IIndirectCall and friends.

Macro system

Another area of development is the addition of a more user friendly macro system into Prexonite Script. Currently, one can rewrite undefined ids, a mechanism that is not supposed to be a replacement for macros but rather a means of changing the default handling of unknown ids.

Alternatively, compiler hooks can be used to transform entire functions. Not only is it very tedious to find the specific calls to replace, it’s also not guaranteed that you find every instance of a certain expression by traversing the tree. All uses of compiler hooks share a similar pattern of searching a certain dummy function reference and replacing it with custom nodes, using the transformed function itself and any of its parameters as arguments.

A possible simplification of this scheme would be the explicit support for macro functions:

macro square(x)
{
  //Define a
  var tmp = define("tmp");
  var assignment = ast("GetSetSymbol",SetCall,tmp,[x]);
  var computation = multiply(tmp,tmp);
  return block([assignment, computation]);
}

The macro square transforms calls to it like ( square(x+2) ) to an expression like ( let tmp = x+2 in (tmp)*(tmp) ). The details are not fleshed out yet. A more concrete discussion follows.