Tuesday, February 1, 2011

A rant on FP in Python compared to Clojure and Lisp

In a previous post, we started investigating closures in imperative settings. Essentially we provided a short Python example where things weren't' really as we expected. Essentially, the problem is that in Python you cannot rebind a variable in the enclosing scope. The problem is relatively minor.
  1. It is a problem we usually do not have in functional settings as rebinding variables (if possible) is not a common practice. For example in Clojure we just can't rebind a variable in a let form; on the other hand in Lisp it is a legitimate action
    (defun make-counter (val)
    (let ((counter val))
    (lambda ()
    (let ((old-counter counter))
    (setq counter (+ counter 1))
    old-counter))))
    
    CL-USER% (defparameter c (make-counter 0))
    C
    CL-USER% (funcall c)
    0
    CL-USER% (funcall c)
    1
    
    

  2. In Python 3 the nonlocal statement solved the problem. The following program, runs just fine.
    import tkinter as tk
    
    class Window(tk.Frame):
        def __init__(self, root):
            tk.Frame.__init__(self, root)
            value = 0
            def inc():
                nonlocal value
                value += 1
                self.label.config(text=str(value))
            self.label = tk.Label(text=str(value))
            self.label.pack()
            self.button = tk.Button(text="Increment", command=inc)
            self.button.pack()
    
    root = tk.Tk()
    w = Window(root)
    w.pack()
    tk.mainloop()
    
However, this is just the tip of the iceberg. Python is not a functional language, and does not even try to be. Python is built around the idea of mutable state, while, e.g., Clojure is built with the idea that state is immutable and areas where state is expected to change have to be explicitly marked (dosync).

Python fully embraces the object oriented facilities. Most "functional like" features Python has (e.g., first order functions) essentially are just byproducts of being a fully object oriented language. If everything is an object, then functions are objects too. It would take a special rule to forbid FOF and since they are also very nice to use, why explicitly forbidding them?

We have lambda (which is severely limited... compare with lisp lambda/clojure fn: both constructs have the full power of their respective "def(n|un)" form, in fact we may say that the def* form is just syntactic sugar over lambda/fn). We have a functool module: which is nice, but of course if the language is dynamic enough we can do that kind of things... but it is a library, its not a language feature (as in Haskell).

There are two more interesting functional facilities: list comprehensions and lazy evaluation. Both apparently were added to Python under the influence of Haskell.

List comprehension are a very powerful construct, and Python has expanded it further with set/dict comprehensions. This is somewhat a unique feature: even Clojure (which enriched Lisp with syntactical dictionaries and sets) does not have syntactical dict/set comprehensions.

To be sincere, I really did not felt as something was missing when in Python we did not have set/dict comprehensions. I absolutely did not miss set comprhensions: since Python has lazy evaluation, I did not feel something like set(x for x in seq) was a major drawback (cft. {x for x in seq} is probably clearer, but who cares?). So if in clojure I have to write:

(set (for [x (range -5 5)] (* x x)))
#{0 1 4 9 16 25}

and if I just feel that this is unbearable:

(defmacro for-set [binds & stmts]
  `(set (for ~binds ~@stmts)))

As both Python and Clojure have lazy stream evaluation in both cases there is no overhead (that is to say, it is not the case that a list is created and then passed to set which makes the set). Yes... perhaps in Lisp some reader forms could "fully" add the functionality... not sure that it is worth, though.

However, now it is the time of discussing Lazy Evaluation. There are different kinds of lazy evaluation. The first is the Haskell way: everything is lazy by default, essentially unless declared strict. This strategy essentially poses serious constraint on the language. Especially, it makes sense only if the language has no side effects. For example, we would not like that the following thing would not write anything because of lazy evaluation:
(defn foo [x] (println x) x)
(def bar (foo 20))
It would be awkward to say the least. On the other hand, Haskell is designed with these ideas in mind. Essentially output (and everything smelling of side-effect) is forbidden to mingle with the rest of the code. Consequently the programs do not have to worry about when things are evaluated.

This strategy would be plainly impossible in imperative languages such as Python. Moreover, in distributed settings default lazy evaluation is a serious problem (which is well known in the Haskell community... consider Wadler interview in Tate's Seven languages in seven weeks). In Lisp it is very simple to add lazy evaluation on demand (there is just an example in Land of Lisp) and in Clojure we can use delay to great effect (I like batteries included). In this case Python lacks the feature: we can use some tricks, but essentially the syntax becomes quite akward (and nobody would really like to do that).

On the other hand, Python introduced the great public to iterator manipuation. Python treats indifferently finite and infinite sequences (which was natural in Haskell and is now natural in Clojure) using a bunch of functions in itertools. Essentially, they were created to work naturally with lazy sequences and are a great asset in the hands of the skilled Python programmer. Clojure embraced this philosophy (and basically has everything Python has in itertools, perhaps more). We should also remember that here we are in the familiar areas of map/reduce stuff... anyway.

The extremely clever thing Python did was introducing the yield statement. With a very imperative syntax, some of the power of continuations has been brought in the language, actually we have a limited implementation of coroutines [coroutines are trivially implemented if you have continuations]; as closures in Python are limited, so are coroutines. On the other hand, programmers who could never have used with profit coroutines and closures can now used their tamed (crippled?) cousins. I generally consider this a good thing... not a big fan of full continuations, indeed.

And what about clojure? Well, clojure has no "yield" statement. I think that it is only a minor issue. Yield is a very functional object trying (and succeeding) to look imperative through fake beard and nose. On the other hand, most uses of yield are more related to building plain generators than to create "tamed" coroutines. Using a more functional style, along with lazy sequences, it is easy to provide nearly as intuitive variants in clojure.


6 comments:

Paul said...

I've really enjoyed your past posts about Python, Imperative languages, Clojure, and Functional Languages.

While Clojure offers no yield statement, there is great power in the lazy-cat and lazy-seq functions. Essentially this operate similar to loop/recur, but generate a lazy sequence for you, just like yield would with a Python generator. Play around with them on the REPL and I think you'll find a new favorite pattern.

Paul deGrandis // ohpauleez on #clojure

Unknown said...

Did I miss something or how is this lazy:

def x(i):
print i
return i

a = set(x(i) for i in xrange(7))
0
1
2
3
4
5
6

Unknown said...

@diffusing thoughts
The first example you provide is not lazy and I did not suggest it was (at least I hope I did not give this impression).

It is a classic example of why *default* lazy evaluation (like in Haskell) would be rather awkward in a language where output instructions could be placed everywhere (like Python).

The problem occurs only when lazy evaluation is the *default*. If one explicitly requires lazy evaluation *and* places side-effects in the evaluated expression then it is only his fault. On the other hand, I believe it would be much more troubling if lazy were the default.

Haskell poses a clear distinction between regular code" and code that runs in the IO monad (and the monad essentially guarantees that output instructions are executed in order). So this problems are not present by design.

In Python it would be a nightmare. In Clojure it would not be nice either. Besides, I prefer to have lazy evaluation only on demand. I'm perfectly fine with the delay approach.


set(x(i) for i in xrange(7)) is not lazy because set is eager. On the other hand (x(i) for i in xrange(7)) is lazy. And

set(x(i) for i in xrange(7)) ==
set((x(i) for i in xrange(7)))

that is to say, we are passing a lazy generator to set.

The idea is that since Python has lazy generators, (and it is possible to omit parentheses in that case), the syntax is nice enough *and* it is not like we have to allocate memory for a list and then "convert" it to a set. So having set comprehensions is not a huge improvement, in my opinion (though, it is nice to have).

As a side note, I think that dict-comprehensions do have a much nicer syntax compared to using dict along with a generator. But that's an entirely different story.

Unknown said...

@Paul

Thank you very much!

I love lazy-* in clojure. I usually have no troubles in adapting to different styles of coding in different languages.

In fact lazy-seq and yield have a common subset of functionality. Yield is more like having tamed coroutines (tamed in the sense that you do not have the full power of coroutines).

In fact the relevant combo here is lazy-seq + agents. I explain: in real life Python I used yield for two different reasons:

1. generators/infinite seqs
2. some kind of asynchronous programming (e.g., Twisted inlineCallbacks and the dozilion of python libraries doing that kind of things)

The 1) is essentially covered by lazy-seq. The 2) is covered by agents and the other concurrency related features of Clojure. In this case the difference are huge: they just solve somewhat similar problems.

Unknown said...

@rik0

Thanks for your explanation. My take on a lazy set in python:

class LazySet(set):

def __init__(self, input):
self.input = input
self.internal_set = set()

def __iter__(self):
return self

def __next__(self):
while True:
element = self.input.__next__()
if element in self.internal_set:
pass
else:
self.internal_set.add(element)
return element

inp = iter([1,2,3,3,4,4,5])
ls = LazySet(inp)
print(list(ls))
>> [1, 2, 3, 4, 5]

Unknown said...

@diffusing_thoughts

I would not use __next__() directly. In python we should not use the __methods__ when there is an alternative. In this case it is the next builtin. Unfortunately next was introduced only in Python 2.6. Consequently many people do not know about it, many books do not deal with it and by using it you cut out python < 2.5 [ although the fix is easy ]. However, you appear to be using Python 3.

A part from that, I believe it is a good solution to the problem. However, I believe there are a couple of bugs.

The first bug is related to inheritance. There is a little debate on whether it is good practice to subclass things such as set and list. I usually prefer not to, as it is easy to introduce bugs.

For example, consider this:

s = LazySet(x*2 for x in range(10))
s.add(1)
print list(iter(s))

You would have to redefine *all* the sets methods to work as expected. In this situation I would probably not subclass: I prefer some AttributeError exception than some odd behaviour caused by me missing to redefine some method properly.

The second bug is that the objects can be read just once. On the other hand lazy stuff sequences can be traversed multiple-times (and in principle should not do computations in later runs). However, in this case, you put all the elements in the internal_set on the first run. As a consequence, subsequent runs do not return any item.

I think that the idea should be to return a proper iterator object. This way subsequent iteration on the same object would just return the whole sequence again.

As a sidenote, I would just make sure that the iterator passed as the actual parameter is:
1. iterable
2. the state of the lazy set does not change when the object changes.

l = [1, 2, 3]
s = LazySet(iter(l))

l.append(4)

print list(s)
# => [1, 2, 3, 4]


I think this is somewhat counter-intuitive. However, I don't know whether fixing this is healthier than leaving it this way.

I have stared a trace here:
https://gist.github.com/813271