Monday, August 31, 2015

Full Monty Compiler for pLisp -- Part 1: Overview

Before diving into the different phases of the compiler, it would be helpful to set out our overall goals and the context which the compilation process fits in.

Our guiding principles are:
  1. Retain as much of the current code base as possible
  2. Use the same object model, core library, grammar and primitives as currently exist in pLisp
  3. All top-level forms are to be compiled to native code (i.e. even simple, ad hoc expressions like (+ 1 1))
  4. Our goal (at least for now) is correctness and completeness; some of the code transformations are not optimal, but we're willing to go with them in the interests of keeping things simple
 As the compilation process moves through the different phases, we end up with simpler and simpler dialects of Lisp:


Not every transformation results in a new dialect; for example, both the input and output of the renaming transformation are in the 'pLisp_IL' dialect.

Onto the grammars for these sundry pLisp dialects (I'm skipping pLisp in the interests of laziness, except to say that it's pLisp_k + macros):

1. pLisp_k

The 'k' stands for 'kernel', as in the core language we're left with once all the macros have been expanded/desugared.

exp ::= literal
    | var
    | (if exp exp exp?)
    | (primop exp+)
    | (lambda var* exp)
    | (error exp)
    | (call-cc exp)
    | (exp exp*)
    | (set var exp)
    | (let exp* exp)
    | (letrec exp* exp)

literal ::= integer | float | symbol | nil | t | character | string

variable ::= <check plisp.lex for the rule(s) defining variables>

primop ::= ' | atom | eq | cons | car | cdr | + | -

| * | / | progn| print | listp | symbol_value
| ` | > | gensym | setcar | setcdr | create-package
| in-pacakge | , | ,@ | expand-macro | apply
| string | make-array | array-get | array-set
| sub-array | array-length| print-string | create-image
| load-foreign-library | call-foreign-function | load-file
| consp | integerp | floatp | characterp | symbolp
| stringp| arrayp | closurep | macrop | continuationp
| format | clone | return| return-from | symbol
| symbol-name | unbind | newline | time | profile
| not | &lt; | &lt;= | >= | != | save-object | load-object
| export-package | env | eval 

2. pLisp_k_no_set

pLisp_k_no_set is identical to pLisp_k, except that there is no 'set' form:

exp ::= literal
    | var
    | (if exp exp exp?)
    | (primop exp+)
    | (lambda var* exp)
    | (error exp)
    | (call-cc exp)
    | (exp exp*)
    | (let exp* exp)
    | (letrec exp* exp)

 
(primops omitted from here on for clarity)

3. pLisp_IL

pLisp_IL ('IL' stands for Intermediate Language) differs from pLisp_k in two ways:
  1. There is no 'letrec' form
  2. A new construct 'let1' that desugars to 'let'. Functionally 'let1' is identical to Common Lisp's 'let*', but since our parser doesn't handle '*', we go with 'let1'.
exp ::= literal
    | var
    | (if exp exp exp?)
    | (primop exp+)
    | (lambda var* exp)
    | (error exp)
    | (call-cc exp)
    | (exp exp*)
    | (let exp* exp)

    | (let1 exp* exp)

4. pLisp_IL_CPS

pLisp_IL_CPS is the continuation-passing version of pLisp. I'll explain this in more detail when I talk about CPS conversion; for now, note two important constraints: a) 'let' forms will have only one binding and b) there are also restrictions on what can be 'lettable' expressions.

exp ::= (var val*)
    | (if val exp exp)
    | (let ((var le) exp))
    | (error exp)
    | (call-cc exp)

val ::= literal | var

le :: = literal | (lambda (var*) exp) | (primop val+)


5. pLisp_IL_Lift

pLisp_IL_Lift continues with the simplification further: the only forms permitted on the right hand sides of binding expressions in 'let' are variables, literals and primitive applications (also, the only permitted arguments to the primitive applications themselves are variables and literals).

exp ::= (var val*)
    | (if val exp exp)
    | (let ((var le) exp))
    | (error exp)
    | (call-cc exp)

val ::= literal | var

le :: = literal | (primop val+)


One form not included in these is 'define', used to create top-level forms. Strictly speaking, 'define' is handled outside the compilation process (though we need to call the compiler to compile and evaluate the expression whose computation should be mapped to the variable being defined).

Subsequent posts will talk in detail about the different phases.