Some languages are very strict about [functional] purity and don't even have assignment statements. Python is not one of these languages thankfully, and I show below how to take advantage of this. Actually in a previous draft he implied that functions should maintain no internal state at all, but since version 0.3 at least, he clarifies that to imply state should not be maintained across invocations.
One should strive to maintain no state of course and use iterators where possible as that tends to use compiled rather than interpreted logic. But when you need to maintain state, do so explicitly with assignment statements, rather than jumping through hoops to maintain state implicitly on the stack. The examples below hopefully illustrate what I mean.
uniq - compressThere was a popular article recently showing functional examples using python's iterator tools. The first (slightly buggy) example there, was the compress() function which is equivalent to the unix filter `uniq`. Incidentally unix filters encompass the same functional programming concepts, with data (usually sequences of lines) flowing in and out with no modification to the external state of the system. So let's implement the uniq utility using the concepts detailed in the above article:
#!/usr/bin/python import sys import itertools def uniq(iterator): iter1, iter2 = itertools.tee(iterator) iter1=itertools.chain(iter((None,)), iter1) iter2=itertools.chain(iter2, iter((None,))) return (b for a,b in itertools.izip(iter1, iter2) if a!=b and b) sys.stdout.writelines(uniq(sys.stdin))There are a couple of problems with this, the main one being that it's quite complicated, to me at least. So considering the statement "complex is better than complicated" from the zen of python, can we do better?
#!/usr/bin/python import sys def uniq(iterator): state=None for line in iterator: if line != state: yield line state=line sys.stdout.writelines(uniq(sys.stdin))This version with it's explicit iteration and assignment statements is more understandable to me. Forcing pure functional concepts here has no advantages. You might think the explicit iteration version would be slower, but it's not, and compares very favourably to the compiled C version of uniq:
$ time uniq1.py < lines.text > /dev/null 1.773s $ time uniq2.py < lines.text > /dev/null 1.408s $ time LANG=C uniq < lines.text > /dev/null 1.258sUpdate June 2011: Raymond Hettinger recently started tweeting and mentioned this more general, slightly slower option:
def uniq(iterator): return (k for k, g in itertools.groupby(iterator))
reduce - foldlOK let's consider another example. I noticed that the foldl implementation in the thirdparty functional module uses recursion. This is the standard pure functional method of maintaining state implicitly on the stack. So lets create a utility to add numbers using this method:
import sys import operator def foldl(func, base, itr): try: first = itr.next() except StopIteration: return base return foldl(func, func(base, first), itr) def add(iter): return foldl(operator.add, 0, iter) print add(int(line) for line in sys.stdin)This has serious scalability issues for python as can be seen in the test run below, which shows that by default, python 2.5 on linux at least, has a recursion limit of 1000. This seems weird but probably is the right thing to do as one can't achieve scalable solutions using unbounded recursion.
$ seq 997 | add.py RuntimeError: maximum recursion depth exceededSo for comparison we re-implement the builtin 'reduce' in python which is equivalent to foldl (which has moved to functools.reduce in python 3.0). Here we don't recurse, and instead update a single state variable on each iteration.
import sys import operator def reduce(func, items, start=None): try: items=iter(items) except: raise TypeError, "reduce() arg 2 must support iteration" if start==None: try: state=items.next() except StopIteration: raise TypeError, "reduce() of empty sequence with no initial value" else: state=start for i in items: state = func(state,i) return state def add(iter): return reduce(operator.add, iter) print add(int(line) for line in sys.stdin)
$ seq 10000 | add.py 50005000
ConclusionWe have seen above that explicitly maintaining state variables internal to a function in python can help with program performance and scalability while not loosing any of the advantages associated with functional programming. Your programs will be easier to write and maintain as well.
For a real world utility that uses the techniques used in this article please see my funcpy script.