Maintaining state in functional python programs

A.M. Kuchling has written a good summary of functional programming in python in which he states Some languages are very strict about [functional] purity and don't even have assignment statements. Python is not one of these languages thankfully, and I show below how to take advantage of this. Actually in a previous draft he implied that functions should maintain no internal state at all, but since version 0.3 at least, he clarifies that to imply state should not be maintained across invocations.

One should strive to maintain no state of course and use iterators where possible as that tends to use compiled rather than interpreted logic. But when you need to maintain state, do so explicitly with assignment statements, rather than jumping through hoops to maintain state implicitly on the stack. The examples below hopefully illustrate what I mean.

uniq - compress

There was a popular article recently showing functional examples using python's iterator tools. The first (slightly buggy) example there, was the compress() function which is equivalent to the unix filter `uniq`. Incidentally unix filters encompass the same functional programming concepts, with data (usually sequences of lines) flowing in and out with no modification to the external state of the system. So let's implement the uniq utility using the concepts detailed in the above article:

#!/usr/bin/python

import sys
import itertools

def uniq(iterator):
    iter1, iter2 = itertools.tee(iterator)
    iter1=itertools.chain(iter((None,)), iter1)
    iter2=itertools.chain(iter2, iter((None,)))
    return (b for a,b in itertools.izip(iter1, iter2) if a!=b and b)

sys.stdout.writelines(uniq(sys.stdin))

There are a couple of problems with this, the main one being that it's quite complicated, to me at least. So considering the statement "complex is better than complicated" from the zen of python, can we do better?

#!/usr/bin/python

import sys

def uniq(iterator):
    state=None
    for line in iterator:
        if line != state:
            yield line
        state=line

sys.stdout.writelines(uniq(sys.stdin))

This version with it's explicit iteration and assignment statements is more understandable to me. Forcing pure functional concepts here has no advantages. You might think the explicit iteration version would be slower, but it's not, and compares very favourably to the compiled C version of uniq:

$ time uniq1.py < lines.text > /dev/null
1.773s
$ time uniq2.py < lines.text > /dev/null
1.408s
$ time LANG=C uniq < lines.text > /dev/null
1.258s

Update June 2011: Raymond Hettinger recently started tweeting and mentioned this more general, slightly slower option:

def uniq(iterator):
    return (k for k, g in itertools.groupby(iterator))

reduce - foldl

OK let's consider another example. I noticed that the foldl implementation in the thirdparty functional module uses recursion. This is the standard pure functional method of maintaining state implicitly on the stack. So lets create a utility to add numbers using this method:

import sys
import operator

def foldl(func, base, itr):
    try:
        first = itr.next()
    except StopIteration:
        return base
    return foldl(func, func(base, first), itr)

def add(iter):
    return foldl(operator.add, 0, iter)

print add(int(line) for line in sys.stdin)

This has serious scalability issues for python as can be seen in the test run below, which shows that by default, python 2.5 on linux at least, has a recursion limit of 1000. This seems weird but probably is the right thing to do as one can't achieve scalable solutions using unbounded recursion.

$ seq 997 | add.py
RuntimeError: maximum recursion depth exceeded

So for comparison we re-implement the builtin 'reduce' in python which is equivalent to foldl (which has moved to functools.reduce in python 3.0). Here we don't recurse, and instead update a single state variable on each iteration.

import sys
import operator

def reduce(func, items, start=None):
    try:
        items=iter(items)
    except:
        raise TypeError, "reduce() arg 2 must support iteration"
    if start==None:
        try:
            state=items.next()
        except StopIteration:
            raise TypeError, "reduce() of empty sequence with no initial value"
    else:
        state=start
    for i in items:
        state = func(state,i)
    return state

def add(iter):
    return reduce(operator.add, iter)

print add(int(line) for line in sys.stdin)

$ seq 10000 | add.py
50005000

Conclusion

We have seen above that explicitly maintaining state variables internal to a function in python can help with program performance and scalability while not loosing any of the advantages associated with functional programming. Your programs will be easier to write and maintain as well.

For a real world utility that uses the techniques used in this article please see my funcpy script.