Paul Bone

PZ Machine Calling Conventions

The Plasma Abstract Machine is the low-level representation of Plasma code, it’s what the compiler targets, and what the interpreter executes. You’ve probably heard of virtual machines, but might not have heard of abstract machines, well they’re almost the same thing, so mostly you can just pretend "virtual machine" here.

There are two (conflated, urgh) things I’ve been puzzling over how best to implement lately. How procedure and closure references should work between modules, and what calling conventions we want to support in the PZ machine. Let’s take a look at the common goals for both these questions, then look at each of them.

Recent history

After reviewing how closures should be implemented and learning that modules can be implemented in terms of closures, which allows for module re/hot-leading. I decided to go with this basic approach. This has made it necessary to refactor the code generator and interpreter, to describe all inter-module calls in terms of closures. It has made some things much simpler, all inter-module references are now closures, rather than either code or data, this means we no-longer need to track whether a data or code reference is from the current module or other module, which was duplicated for both code and data. Now, with a new type, an import, imports are handled in one place, and all imports are imported so the distinction for code and data can be totally removed.

Where I went wrong was to take this oversimplification too far, and not thinking through some specialised cases that are better to keep. So, this article is me thinking this through with the goal of providing a reasonable base level of flexibility and performance. Unfortunately it’s a bit of a brain dump, dear reader, if you feel you "didn’t get it", that’s my fault.

Common goals

This is not about every possible optimisation that could be performed, but of a simple default set of assumptions that would make not-inefficient code. For example, this concerns the bytecode interpreter or JIT, and is likely to be optimised more heavily for AOT code generation.

Like I said above, it’d be nice to support hot-loading of modules. Actually these aren’t modules, Plasma will have separate code units for load-time units and separate-compilation units. A compilation unit is a module (made from a single .p file), and a load-time object will be a package, made of several modules linked into a single .pz file.

Supporting hot-loading means using an indirect closure call for every inter-package call. This means that the closure can carry with it the data that code in that package may need, including references to closures in other packages.

For a quick simple explanation, if we assume that everything is a closure and can be referred to by other closures, then freeing unused code via the garbage collector becomes very simple: The only think that can refer to code is a closure, code can’t refer to anything so doesn’t need to be traced or updated, and can be moved about easily. That’s all very simple but it can be inefficient, there’s ways to improve it which I’ll discuss below.

Call types / conventions

There are three properties for each call:

  • closure/non-closure calls

  • direct/indirect calls

  • tail/non-tail calls

Which can naively combine for eight possible types of calls. However there are actually six because an indirect call can never be a non-closure call (not without more work by the compiler). So there are six possibilities:

  • indirect, closure, non-tail

  • indirect, closure, tail

  • direct, closure, non-tail

  • direct, closure, tail

  • direct, non-closure, non-tail

  • direct, non-closure, tail

Closure and non-closure calls also represent two calling conventions It’s important to count calling conventions separately because it can mean having two types of return instruction and/or affecting tail calls.

Non-closure calling convention as optimisation

The next way to simplify this is to include only a single calling convention, closure calls, and allow non-closure calls as an optimisation. This also means that any non-closure call cannot be a tail call. Reducing the number of call types to five.

We expect that non-closure tail calls would have been rare anyway, only possible when the callee and caller both have the non-closure calling convention. This calling convention would have only been possible for procedures that don’t require an environment (or use the same environment as all their callers) whose address is never taken (because we removed indirect non-closure calls above).

To allow them as an optimisation, but not in the default semantics, we would use the normal call/return instructions with them, but provide enough information that the code loader can use to switch these for optimised non-closure ones. Adding this optimisation later will be a simple matter of engineering.

Module boundaries

To change tack for a moment lets talk about how we refer to code in different contexts.

Right now on the 'master' branch each procedure gets an ID, a Proc ID, it’s assigned during compilation. All procedures of the module being compiled get an ID, and a table of ID→name mappings for imported procedures is also built. During load time IDs are looked up and they either map to a procedure in that module (whose address we know) or a foreign name to be linked in.

(The ID→name table is turned into an ID→address table while reading the module so that the name→address lookups are only done once per name.)

On the 'closures' branch I’ve added a new Import ID type, these IDs refer only to closures in other modules, and now Proc IDs refer only to procedures in the same module. So far this has been great, simplifying how procedure IDs are resolved and where they can be used. For example: it was my intention that a call instruction can refer only to procedure IDs (making it an intra-module call), and load_named instructions can refer only to a table of Import ID→name used to resolve inter-module references.

The initial implementation of load_named was incorrect. It was (so far) implemented as retrieving a closure pointer from an arbitrary structure, so far so good. However the structure it used was created by the callee’s module, it instead needs to be part of the caller’s module and created when the Import ID→name map is read during load time. So each caller-package ends up with a different table with pointers to callees. We still have Proc IDs (to be renamed to Local IDs) which shall refer to closures in the current package, and Import IDs that refer to closures in foreign packages. And the following instructions (so far):

// direct, closure, non-tail call
call LocalId

// direct, closure, tail call
tcall LocalId

// Get a pointer to a local closure (takes its address)
load_named LocalId

// Get a pointer to a foreign closure (lookup in a table)
load_named ImportId

// indirect, closure, non-tail call
ind_call

// indirect, closure, tail call
ind_tcall

Recall that local non-closure calls can be an optimisation performed at load time with enough information encoded in the .pz file.

Optimising inter-package calls

All those indirect inter-package calls can be tedious and inefficient particularly when there are lots of short calls between packages. Instead we can say this is true only for calls to packages that we might want to reload (or reload without also re-loading things that depend on them). One reason to do this is so that the GC does not need to trace code, but trace a closure’s environment. However, we can support direct calls if we require that the callee is pinned at that memory address and the caller provides a table of all its callees.

It would be possible to use direct calls for inter-package calls and still allow re-linking, but id prefer to use closures for this. Without using closures there are a few challenges, nothing insurmountable, just engineering, but they do add complexity: One problem is how the GC can find and update pointers in code. There are multiple solutions and the discussion is off topic for this article.

Therefore, we will use indirect calls by default and when we may want to re-link packages. But allow some packages to request that calls to them are direct, saying that that package probably won’t be re-loaded. This will be visible in the PZ bytecode as it will be a call or tcall instruction that takes an Import ID, the import will be looked up in the table at load time, not run time (as with load_named), and therefore restrict re-loading.

There’s another optimisation we can do here. Above we talked about allowing non-closure calls for intra-package calls when we know its safe. We may want to extend this to some very special symbols, such as some in the builtin module. Some very short procedures might be inlined during loading time also, which should also be a special annotation on the callee and be considered a more aggressive form of the non-closure optimisation, since it adds similar restrictions.

Final list of call types

indirect, closure

Most general call, used inter- and intra-package. It must be used inter-package where we wish to allow relinking. Has tail-recursive and non-tail-recursive versions. Callees are found on the stack. Instructions exist for loading the correct field of a structure which will contain our callee. These must refer to "import ids".

direct, closure

Used intra-package or when we know we don’t want to reload a package; like a standard library or prelude package, but maybe others. Has tail-recursive and non-tail-recursive versions. There are two ways to specify callee, either a closure in the current module (we have to pin it somehow). Or as a closure in another module. Which may be able to be specified via "import id" but I don’t know yet.

direct, non-closure, non-tail

Not actually part of PZ, used as an optimisation when we would have used a closure call, but we know the callee either uses the same environment or doesn’t need an environment and won’t return by tailcall.

We will also want to overload the call and tcall instructions with versions that take Import IDs.

So there’s still five call types when you include tail calls, and two ways to specify the callee (Import or Local ID).

It’s more complex than I was going for, and where I went wrong, but I think it’s the right level of complexity.