Thursday, April 11, 2013

Building a Lisp Interpreter from Scratch -- Part 4: Memory System

(This is Part 4 of a series of posts on pLisp)

(Note: This post is quite out of sync with the code base; please see Part 13)

As I mentioned briefly in Part 3, memory in pLisp is allocated from a fixed-size heap and objects are referred via indexes (called RAW_PTRs) into this heap.

Memory is addressed in pLisp not at the byte level, but at the word (32 bit) level, as indicated by the typedef for OBJECT_PTR (unsigned int). This is inefficient when we're dealing which characters, for example, but efficiency is not the primary concern for us.

The interface to the memory system is through three calls:

RAW_PTR object_alloc(int);
void set_heap(RAW_PTR, OBJECT_PTR);
OBJECT_PTR get_heap(RAW_PTR);
  1. object_alloc() is the equivalent of C's malloc(). It takes as argument the number of four-byte words to be allocated, and returns a RAW_PTR that points to the newly allocated space in the heap.
  2. set_heap() sets the pointed-to memory location in the heap to the given OBJECT_PTR value.
  3. get_heap() is the read counterpart of set_heap().
There is no deallocation procedure because we have GC that kicks in right after a form has been evaluated and its results reported to the top level. GC is a topic for another post.

The memory system is a straightforward text book implementation. We maintain a list of available chunks/segments of memory in the form of a linked list (initially there is just one big segment as big as the full heap; this gets chopped up as allocation requests are serviced).

The first word of a segment contains its size; the next word points to the next available segment. The subsequent words constitute the space available in that segment:



We maintain two variables called 'free_list' and 'last_segment' to store the start and end of the linked list respectively. To allocate memory of a certain size, we walk the free list and stop when we encounter the first segment that is large enough for our needs. The segment is removed from the list and its space is given to pLisp; whatever is left over is packaged off into a new (smaller) segment. An obvious improvement would be to look for the segment whose size most closely matches the amount of memory requested, rather than settle for the first segment that meets our needs. When it's time to free memory -- through GC -- the segment to be deallocated is simply attached to the free list at the end.

To illustrate this with an example, here's the implementation of the CONS operator:

OBJECT_PTR cons(OBJECT_PTR car, OBJECT_PTR cdr)
{
  log_function_entry("cons");

  RAW_PTR ptr = object_alloc(2);

  set_heap(ptr, car);
  set_heap(ptr+1, cdr);

  insert_node(&white, create_node((ptr << CONS_SHIFT) + CONS_TAG));

  log_function_exit("cons");

  return (ptr << CONS_SHIFT) + CONS_TAG;
}

Ignore the call to insert_node() for now; this is for GC and doesn't have a bearing on what we've discussed so far. Creation of objects of other types in pLisp follows the same template:
  1. A call to object_alloc() to allocate the memory
  2. Call(s) to set_heap() to populate both the type-specific information (e.g. length for array objects) and the object content itself (array elements, for example)
  3. Decorating the RAW_PTR with the relevant object tag and returning it.
And we're done with memory.