Tuesday, April 16, 2013

Building a Lisp Interpreter from Scratch -- Part 5: Compiler, Virtual Machine

(This is Part 5 of a series of posts on pLisp)

Using a compiler and a virtual machine for an interpreter may seem somewhat of an overkill, but makes sense for two reasons:
  1. Incorporating continuations in pLisp showed me that a VM-based approach (hence a compiler) is a straightforward way to do so -- virtual registers, call stack, the whole works -- rather than mucking around with code walkers and the like.
  2. Writing a compiler would also be useful when we want to run pLisp code natively.
Before we proceed further, some attributions are in order: Kent Dybvig's PhD thesis is pretty much the reference for much of this post.

The last time we spoke about the interpreter per se, we had just converted an expression_t struct into a native pLisp object (OBJECT_PTR). We are now ready to do further things with this pLisp object:
  1. Compile it
  2. Interpret it
(who could have seen that coming?)

The compiler converts a pLisp source form into a simplified Lisp form that is understood by the VM/interpreter. The following are the constructs in the simplified Lisp:

HALTHalts the virtual machine
REFERLoads a variable reference
CONSTANTLoads a constant
CLOSECreates a closure
MACROCreates a macro
TESTImplements IF/ELSE
ASSIGNStores a value in a variable
DEFINECreates a variable binding in the top level
CONTICreates a continuation
NUATEExecutes a continuation by replacing the current stack with that of the continuation
FRAMEStores the register contents in a new frame and pushes the frame on to the call stack
ARGUMENTAdds the results of the last evaluation to the value rib
APPLYClosure/Macro/Continuation application
RETURNPops a frame, restores registers
BACKQUOTEProcesses a backquote (can't be done by the compiler because we need to do some compiling at run time; see below) 

These constructs are predicated on the existence of the following, for want of a better term, ISA elements:

//register that stores the 
//results of the last computation
OBJECT_PTR reg_accumulator;

//the next expression to be 
//evaluated by the VM
OBJECT_PTR reg_next_expression;

//the environment in which the
//current expression is to be evaluated
OBJECT_PTR reg_current_env;

//holds the arguments evaluated
//till now (for closure applications)
OBJECT_PTR reg_current_value_rib;

//the call stack
OBJECT_PTR reg_current_stack;

The VM interprets the form pointed to by reg_next_expression, by suitably manipulating the registers, and sets the next expression to be evaluated, thereby perpetuating the computation, till it runs out of expressions to evaluate.

The astute reader will observe that all these elements are first-class pLisp objects. This is imperative especially for the current stack, since continuations encapsulate the call stack (although, to be fair, there are ways to reify the stack using native mechanisms and still end up with a first-class continuation object).

The call stack itself is a CONS object made up of frames, themselves pLisp arrays containing the next expression, environment, and the current value rib.

The working of the compiler is somewhat peculiar; the simplified Lisp that it produces from a pLisp source form is something like a Russian-doll-within-doll toy. An example will make this clear:

The source form

(define x 10)

gets compiled into

(CONSTANT 10 (DEFINE X (HALT)))

i.e., we have three 'instructions' to be interpreted by the VM (CONSTANT, DEFINE and HALT), and the first instruction contains the second, the second contains the third, and so on. Looks a bit disconcerting at first, but it works.

The compiler and interpreter work pretty independently of each other, except when it comes to macros and EVAL. If the compiler detects that it's dealing with a macro, it sets things up so that a call is made to the interpreter, and receives the results of the interpreting for compilation. The interpreter enlists the help of the compiler to process BACKQUOTE, COMMA, COMMA-AT and EVAL.