Principles of Programming Languages

Blank, Fall 2014

Assignment #6: Implementing Calc in Python with Continuations

Due: Thursday, October 30, 2014 at beginning of class

This assignment is designed to:

  • Understand the limits of our language in Python
  • Write Python functions in Continuation-Passing Style
  • Eliminate the need for a call stack in evaluator

From the last couple of assignments, we have created a simple "Calc" language that actually looks (and behaves) very similar to Scheme. That can be confusing! Is there anything really special about writing Calc in Scheme? Could we easily implement Calc in another language. Well, let's see!

Some of you may not know Python, but it is easy to pick up, and may be closer to your first programming language than Scheme is. So, let's re-write what we have so far of our Calc language in Python.

You may want to search the internet for Pythonisms as we go through this notebook.

Concrete Syntax

When we implemented Calc in Scheme, we used Scheme's built-in sexp reader. Thus, we could just represent Calc expressions as quoted Scheme data. That prevented us from using curly braces and commas in our concrete syntax, but got us going quickly.

Python doesn't have such a built-in reader to read s-expressions. So, let's add just the bare minimum code needed to replicate Scheme's sexp reader. We will write our Calc expressions as a string, and turn that into AST (Abstract Syntax Tree) through a series of parsing steps.

The first thing we do to make it easy for us is to build a "lexer". A lexer just chunks the characters of the given concrete syntax string into parts. So, "12.3" will be chunked together, as will "set!".

In [ ]:
def lexer(string):
    retval = []
    current = ""
    for i in range(len(string)):
        if string[i] in ["(", "[", ")", "]"]:
            if current:
                retval.append(current)
            current = ""
            retval.append(string[i])
        elif string[i] in [" ", "\t", "\n"]:
            if current:
                retval.append(current)
            current = ""
        else:
            current += string[i]
    if current:
        retval.append(current)
    return retval
In [ ]:
lexer("(define x (1.2 +  4))")

We see that the lexer program is quite simple, chunking based on brackets and white space. Test it to make sure it can handle the Calc items we are interested. Make a note of things it does not correctly handle. We'll need to fix those to make a complete Calc.

Next, we feed the output of the lexer (a list) into a "reader" that turns the "stream" of chunks into meaningful items. So, it should turn string numbers into real numbers, and bracketed items into real lists.

In [ ]:
def reader(lexp):
    current = None
    stack = []
    for item in lexp:
        if item.isdigit():
            if current is not None:
                current.append(eval(item))
            else:
                current = eval(item)
        elif item in ["[", "("]:
            if current is not None:
                stack.append(current)
            current = []
        elif item in ["]", ")"]:
            if stack:
                stack[-1].append(current)
                current = stack.pop(-1)
            else:
                pass
        else:
            if current is not None:
                current.append(item)
            else:
                current = item
    return current
In [ ]:
reader(lexer("(define x (x + 1))"))

Reader is a simple "finite state machine" (FSM) that keeps track of being inside of a list of lists, so that it can push and pop off a stack when it encounters nested lists. It is still pretty simple, but this now can be given to a parser much like we did in Scheme. Again, test this to make sure it groups correctly. Make notes of any issues.

First, let's define some Scheme-like functions to access elements of Python's lists. Notice that the notation list[1:] represents the items in a list starting with position 1 (the second item) and going to the end. Thus, we treat Python's list data structure as if it were a linked-list. It isn't exactly, but we can pretend to make the ideas from our Scheme implementation easy to migrate over to Python.

In [ ]:
def car(exp):
    return exp[0]
def cadr(exp):
    return exp[1]
def caddr(exp):
    return exp[2]
def cadddr(exp):
    return exp[3]

def cdr(exp):
    return exp[1:]
def cddr(exp):
    return exp[2:]
def cdddr(exp):
    return exp[3:]
def cddddr(exp):
    return exp[4:]

def cons(item, lst):
    return [item] + lst

def null_q(lst):
    return len(lst) == 0

def operator_q(item):
    return item in ["+", "-", "*", "/"]

Abstract Syntax Tree (AST)

Python doesn't have a define-datatype. We could write one. But for now, let's just hard-code the grammar of Calc into these expression functions and the parser:

In [ ]:
def lit_exp(value):
    return ["lit-exp", value]

def if_exp(test_exp, then_exp, else_exp):
    return ["if-exp", test_exp, then_exp, else_exp]

def let_exp(variable, value, body):
    return ["let-exp", variable, value, body]

def var_exp(symbol):
    return ["var-exp", symbol]

def assign_exp(symbol, value):
    return ["assign-exp", symbol, value]

def define_exp(symbol, value):
    return ["define-exp", symbol, value]

def app_exp(f, args):
    return ["app-exp", f, args]

def procedure_exp(variables, body):
    return ["procedure-exp", variables, body]

def closure_exp(env, variables, body):
    return ["closure-exp", env, variables, body]

## and a utility function for defining closure?:
def closure_q(item):
    return (isinstance(item, list) and 
            len(item) > 0 and
            item[0] == "closure-exp")

The parser should look pretty much just like our Scheme version, with the exception that the parenthesis come after the function names in Python. Python doesn't have a record-case or even a case statement, so we just use a series of if/elif tests.

In [ ]:
def parser(exp):
    if isinstance(exp, int):
        return lit_exp(exp)
    elif isinstance(exp, str):
        return var_exp(exp)
    elif car(exp) == "set!":
        return assign_exp(cadr(exp), parser(caddr(exp)))
    elif car(exp) == "define":
        return define_exp(cadr(exp), parser(caddr(exp)))
    elif car(exp) == "if":
        return if_exp(parser(cadr(exp)), parser(caddr(exp)), parser(cadddr(exp)))
    elif car(exp) == "let":
        return let_exp(cadr(exp), parser(caddr(exp)), parser(cadddr(exp)))
    elif car(exp) == "func":
        return procedure_exp(cadr(exp), parser(caddr(exp)))
    elif len(exp) == 3 and operator_q(exp[1]):
        return app_exp(parser(cadr(exp)), [parser(car(exp)), parser(caddr(exp))])
    else:
        return app_exp(parser(car(exp)), map(parser, cdr(exp)))
In [ ]:
parser(reader(lexer("23")))
In [ ]:
parser(reader(lexer("(if 0 23 24)")))
In [ ]:
parser(reader(lexer("(let x 1 x)")))
In [ ]:
parser(reader(lexer("(+ 1 2)")))
In [ ]:
parser(reader(lexer("(let x 1 +)")))
In [ ]:
parser(reader(lexer("(define x 1)")))
In [ ]:
parser(reader(lexer("(let x 1 (x + x))")))
In [ ]:
parser(reader(lexer("(func (x) x)")))

This looks like it is parsing the structures of Calc pretty well. Thoroughly test out as many variations as you can to see if there are any bugs. Note that our parsing code (including lexer and reader) doesn't handle things like quoted items.

Evaluation

Like in Scheme, we need to implement booleans. Again, we use zero for false, and anything for true.

In [ ]:
def true_q(value):
    if value == 0:
        return False
    else:
        return True

Again, we use nested frames of variable/value pairs as the environment.

In [ ]:
def extend_env(vars, vals, env):
    return [map(list,zip(vars, vals)), env]
In [ ]:
def lookup_binding(symbol, env, first_frame_only=False):
    frame = car(env)
    for pair in frame:
        if pair[0] == symbol:
            return pair
    if not first_frame_only and cdr(env):
        return lookup_binding(symbol, car(cdr(env)))
    else:
        return False

Like the Scheme version, we allow functions to be either Python functions or closures:

In [ ]:
def applier(f, args):
    if closure_q(f):
        env = cadr(f)
        parameters = caddr(f)
        body = cadddr(f)
        return evaluator(body, extend_env(parameters, args, env))
    else:
        return apply(f, args)

Finally, the evaluator. It should look completely analogous to the Scheme version:

In [ ]:
def evaluator(exp, env):
    if car(exp) == "lit-exp":
        return cadr(exp)
    elif car(exp) == "var-exp":
        return cadr(lookup_binding(cadr(exp), env))
    elif car(exp) == "let-exp":
        variable = cadr(exp)
        value = evaluator(caddr(exp), env)
        return evaluator(cadddr(exp), extend_env([variable], [value], env))
    elif car(exp) == "if-exp":
        if true_q(evaluator(cadr(exp), env)):
            return evaluator(caddr(exp), env)
        else:
            return evaluator(cadddr(exp), env)
    elif car(exp) == "assign-exp":
        variable = cadr(exp)
        value = evaluator(caddr(exp), env)
        binding = lookup_binding(variable, env)
        if binding:
            binding[1] = value
        else:
            raise Exception("No such variable: '%s'" % variable)
        return value
    elif car(exp) == "define-exp":
        variable = cadr(exp)
        value = evaluator(caddr(exp), env)
        binding = lookup_binding(variable, env, True)
        if binding:
            binding[1] = value
        else:
            car(env).append([variable, value])
        return value
    elif car(exp) == "procedure-exp":
        return closure_exp(env, cadr(exp), caddr(exp))
    elif car(exp) == "app-exp":
        return applier(evaluator(cadr(exp), env), 
                       map(lambda e: evaluator(e, env), caddr(exp)))
    else:
        raise Exception("invalid abstract syntax: %s" % exp)

Python doesn't allow operators (such as + and -) to be used as "first-class objects"---they can't be passed into a function. But, Python supplies equivalent symbols in the "operator" library:

In [ ]:
import operator

And so, we define a toplevel environment just like before:

In [ ]:
toplevel_env = [[["+", operator.add], 
                 ["-", operator.sub],
                 ["*", operator.mul],
                 ["/", operator.div],
                 ["=", lambda a,b: 1 if a == b else 0]]]

Feel free to add more items in the environment.

And let's test:

In [ ]:
evaluator(parser(reader(lexer("(if 1 23 24)"))), toplevel_env)

That is a lot to type to test. Let's make a convenience function, like in Scheme:

In [ ]:
def calc(exp):
    return evaluator(parser(reader(lexer(exp))), toplevel_env)
In [ ]:
calc("(if 0 2 3)")
In [ ]:
calc("(let x 1 +)")
In [ ]:
calc("(let x 1 (x + x))")
In [ ]:
calc("(define y 1)")
In [ ]:
toplevel_env
In [ ]:
calc("(set! x 1)")
In [ ]:
toplevel_env
In [ ]:
calc("(func (x) x)")
In [ ]:
calc("((func (x) x) 5)")
In [ ]:
calc("(define f (let x 1 (func () x)))")
In [ ]:
calc("(f)")
In [ ]:
calc("(let x 2 (f))")
In [ ]:
calc("(define factorial (func (n) (if (= n 1) 1 (* (factorial (- n 1)) n))))")
In [ ]:
calc("(factorial 5)")

Excellent! Everything appears to work, just like the Scheme version. Thoroughly test the evaluator to make sure it can handle Calc expressions. Make notes of any failures.

Finally, let's try one more test, an infinite loop:

In [ ]:
calc("(define loop (func () (loop)))")
In [ ]:
calc("(loop)")

Stack overflow!

Fix stack overflow with Continuations

To fix our implementation of our Calculator language, we'll need to rewrite the evaluator in Continuation-Passing Style (CPS).

Let's see what CPS looks like in Python.

Consider factorial in Python:

In [ ]:
def factorial(n):
    if n == 1:
        return 1
    else:
        return n * factorial(n - 1)
In [ ]:
factorial(5)

But even this simple program has the stack overflow issue:

In [ ]:
factorial(1000)

Stack overflow!

We can begin to address the issue by first converting to CPS:

In [ ]:
def factorial_cps(n, k):  ## add extra param, k for continuation
    if n == 1:
        return k(1)       ## pass results to the continuation
    else:
        return factorial_cps(n - 1,   ## move computation after call
                  lambda v: k(v * n)) ## to a new continuation passed in

Yes, Python has a lambda. Unfortunately, Python's lambda is very limited: you can only have a single expression in the body of the lambda. That means no loops or complex statements, such as an if/else.

In [ ]:
factorial_cps(5, lambda v: v) ## call with initial continuation, the identity function

Now we will have factorial return the continuation (rather than executing it right there) and use a trampoline.

In [82]:
def factorial_cps(n, k):  ## add extra param, k for continuation
    if n == 1:
        return [k, 1]       ## pass results to the continuation
    else:
        return factorial_cps(n - 1,     ## move computation after call
                  lambda v: [k, v * n]) ## to a new continuation passed in
In [83]:
factorial_cps(5, lambda v: v)
Out[83]:
[<function __main__.<lambda>>, 1]

We can build a trampoline like we did in Scheme:

In [ ]:
def trampoline(result):
    while isinstance(result, list) and callable(result[0]):
        result = result[0](result[1])
    return result

And execute the code as we did in Scheme:

In [ ]:
trampoline(factorial_cps(5, lambda v: v))

However, this still doesn't allow use to compute factorial of 1000.

This won't work yet:

## DON'T DO THIS:
trampoline(factorial_cps(1000, lambda v: v))

That is because we are using recursion to build up the chain of continuations. To fix this issue, we'll need one more step. But in practice, this will work fine for our evaluator.

Why? Because we typically don't write code that will break the bounds of our recursive limits. For example, if we tried to write code like this:

(+ (+ (+ (+ (+ (+ (+ .... repeat for 1000 times

Then our evaluator won't work. We should allow code like that to work, but to do so will require one more step after CPS.

Convert Evaluator to CPS

Now, we will tackle writing the evaluator into CPS. That will involve the same steps as before:

  1. add parameter k to the list of parameters
  2. if we have a result, give it to k (eg, "apply the continuation")
  3. if there is computation after the recursive call, create a continuation, and move the computation there
  4. always pass a continuation to the evaluator

Here is the evaluator as it is:

In [ ]:
def evaluator(exp, env):
    if car(exp) == "lit-exp":
        return cadr(exp)
    elif car(exp) == "var-exp":
        return cadr(lookup_binding(cadr(exp), env))
    elif car(exp) == "let-exp":
        variable = cadr(exp)
        value = evaluator(caddr(exp), env)
        return evaluator(cadddr(exp), extend_env([variable], [value], env))
    elif car(exp) == "if-exp":
        if true_q(evaluator(cadr(exp), env)):
            return evaluator(caddr(exp), env)
        else:
            return evaluator(cadddr(exp), env)
    elif car(exp) == "assign-exp":
        variable = cadr(exp)
        value = evaluator(caddr(exp), env)
        binding = lookup_binding(variable, env)
        if binding:
            binding[1] = value
        else:
            raise Exception("No such variable: '%s'" % variable)
        return value
    elif car(exp) == "define-exp":
        variable = cadr(exp)
        value = evaluator(caddr(exp), env)
        binding = lookup_binding(variable, env, True)
        if binding:
            binding[1] = value
        else:
            car(env).append([variable, value])
        return value
    elif car(exp) == "procedure-exp":
        return closure_exp(env, cadr(exp), caddr(exp))
    elif car(exp) == "app-exp":
        return applier(evaluator(cadr(exp), env), 
                       map(lambda e: evaluator(e, env), caddr(exp)))
    else:
        raise Exception("invalid abstract syntax: %s" % exp)

Below, lit-exp, if-exp, and app-exp have been converted for you. In addition, evaluator1, evaluator2, evaluator3, and evaluator_map have been written. Use apply_cont when you have a result.

Problem 1: convert the rest of evaluator to CPS and test.

In [ ]:
def apply_cont(k, v):
    return k(v)
In [ ]:
def evaluator(exp, env):
    if car(exp) == "lit-exp":
        return apply_cont(k, cadr(exp))
    elif car(exp) == "var-exp":
        return cadr(lookup_binding(cadr(exp), env))
    elif car(exp) == "let-exp":
        variable = cadr(exp)
        value = evaluator(caddr(exp), env)
        return evaluator(cadddr(exp), extend_env([variable], [value], env))
    elif car(exp) == "if-exp":
        return evaluator(cadr(exp), env, lambda value: evaluator3(value, exp, env, k))
    elif car(exp) == "assign-exp":
        ## USE evaluator1 HERE!
        variable = cadr(exp)
        value = evaluator(caddr(exp), env)
        binding = lookup_binding(variable, env)
        if binding:
            binding[1] = value
        else:
            raise Exception("No such variable: '%s'" % variable)
        return value
    elif car(exp) == "define-exp":
        ## USE evaluator2 HERE!
        variable = cadr(exp)
        value = evaluator(caddr(exp), env)
        binding = lookup_binding(variable, env, True)
        if binding:
            binding[1] = value
        else:
            car(env).append([variable, value])
        return value
    elif car(exp) == "procedure-exp":
        return closure_exp(env, cadr(exp), caddr(exp))
    elif car(exp) == "app-exp":
        return evaluator(cadr(exp), env, 
             lambda v: evaluator_map(caddr(exp), env, 
                lambda all: applier(v, all, k)))
    else:
        raise Exception("invalid abstract syntax: %s" % exp)

def evaluator1(value, variable, env, k):
    binding = lookup_binding(variable, env)
    if binding:
        binding[1] = value
    else:
        raise Exception("No such variable: '%s'" % variable)
    return apply_cont(k, value)

def evaluator2(value, variable, env, k):
    binding = lookup_binding(variable, env, True)
    if binding:
        binding[1] = value
    else:
        car(env).append([variable, value])
    return apply_cont(k, value)        
        
def evaluator3(value, exp, env, k):
    if true_q(value):
        return evaluator(caddr(exp), env, k)
    else:
        return evaluator(cadddr(exp), env, k)

def evaluator_map(exp, env, k):
    if null_q(exp):
        return apply_cont(k, [])
    else:
        return evaluator_map(cdr(exp), env, 
                  lambda v1: evaluator(car(exp), env, 
                      lambda v2: apply_cont(k, cons(v2, v1))))

Here is a CPS applier:

In [ ]:
def applier(f, args, k):
    if closure_q(f):
        env = cadr(f)
        parameters = caddr(f)
        body = cadddr(f)
        return evaluator(body, extend_env(parameters, args, env), k)
    else:
        return apply_cont(k, apply(f, args))

And finally, a convenience method:

In [ ]:
def calc(exp):
    return evaluator(parser(reader(lexer(exp))), toplevel_env, lambda v: v)
In [ ]:
calc("(define factorial (func (n) (if (= n 1) 1 (* (factorial (- n 1)) n))))")
In [ ]:
calc("(factorial 5)")

That should now work. But it still calls the continuations from inside other continuations, getting deeper and deeper into our call stack.

We can fix that by redefining our apply_continuation function:

In [ ]:
def apply_cont(k, v):
    return [k, v]
In [ ]:
calc("(factorial 5)")

Of course, we again need a trampoline to execute the continuations:

In [ ]:
def continuation_q(exp):
    return isinstance(exp, list) and callable(exp[0])

def trampoline(result):
    print result
    while continuation_q(result):
        result = car(result)(cadr(result))
        print result
    return result
In [ ]:
trampoline(calc("(+ 1 2)"))
In [ ]:
trampoline(calc("(if (+ 1 2) 6 7)"))
In [ ]:
trampoline(calc("((func (n) (+ n 1)) 42)"))
In [ ]:
trampoline(calc("(factorial 5)"))

If everything works correctly, we can now create infinite loops in our language implemented in Python, without crashing the call stack:

In [ ]:
calc("(loop)")

Let's create a special trampoline (so we don't print out an infinite list of items):

In [ ]:
count = 0
continuation = None
def trampoline(result):
    global count, continuation
    while continuation_q(result):
        result = car(result)(cadr(result))
        count += 1
        if count % 1000 == 0:
            continuation = result
            return
    return result

We start it up, and run for 1000 steps:

In [ ]:
trampoline(calc("(loop)"))
In [ ]:
count

We keep track of the last continuation so we can continue. Because the continuation is a result, we can just pass it back to the trampoline:

In [ ]:
trampoline(continuation)
In [ ]:
count

If we wanted, we could do that forever!

Problem 2: Make calc look as much like Scheme as you can. What else do you need to add to the language? Possible ideas: add strings, add and and or, add missing functions to the environment, add other special forms from Scheme, etc.

Problem 3: Provide a summary description of what you have done in your own words. You should write as if you were explaining the process to someone who hasn't taken this course. It should explain continuations, CPS, and what problem we solved by using this technique. Also, demonstrate the additions that you made to the language. Use code to demonstrate where appropriate.