Tuesday, September 20, 2011

Eclipse...

Is it possible that to upgrade a major version of Eclipse there is no other way than just reinstall every bloody plugin I'm using?

Tuesday, September 6, 2011

Clojure: Quicksort in Continuation Passing Style and Trampolines

Introduction

In this post we are going to write a completely recursive stack-wary version of the classical quicksort algorithm in continuation passing style, making use of trampolines.
I already discussed quicksort in continuation passing style in this post. The idea is to refer to the post as much as possible and introduce here more information.
From Wikipedia:
Instead of "returning" values as in the more familiar direct style, a function written in continuation-passing style takes an explicit "continuation" argument, i.e. a function which is meant to receive the result of the computation performed within the original function. Similarly, when a subroutine is invoked within a CPS function, the calling function is required to supply a procedure to be invoked with the subroutine's "return" value. Expressing code in this form makes a number of things explicit which are implicit in direct style. These include: procedure returns, which become apparent as calls to a continuation; intermediate values, which are all given names; order of argument evaluation, which is made explicit; and tail calls, which simply is calling a procedure with the same continuation that was passed to the caller, unmodified.
Essentially there are two issues in CPS with Clojure. The first one is recursion on the stack. CPS is tail recursive by default; however, most of the times it is about mutually recursive calls. Which essentially means trampolines.
The second issue relates to the continuation themselves. First, they may occupy heap space (as the continuations stay in memory until evaluated), second, the continuations themselves need to be treated with trampolines, otherwise, when evaluated consume stack space.
However, with massive usage of trampolines, both issues can be solved.

Append

The first function we need to write is append (i.e., concat in Clojure Jargon). We need to write a new version because we need to have it deal with the continuation. And as we will see, append is a false friend.
(defn append-notramp [lst1 lst2 k]
  (cond
   (empty? lst1) (k lst2)
   :else (recur (rest lst1) lst2
                 (fn [rst]
                   (k (cons (first lst1) rst))))))
At first sight, his implementation looks correct. We can try it with simple inputs and convince ourselves that the semantics is right. Essentially if there are elements in the first list, the new continuation is a function taking a list rst and calling the continuation we received with a list where we consed the first element of lst1 on rst.
However, running with increasingly large values for lst1 results in a stack overflow.
user> (cps/append-notramp (range 100000) (range 4) identity)
; Evaluation aborted.
In this case, recur is a false friend. We are lead to think that since append-notramp is tail-recursive, we do not have to worry about stack overflow. The issue here are the chain of continuations, however. And they cannot use recur, because even though they *are* tail recursive, they are mutually recursive. Luckily enough, Clojure provides trampolines and in this case, they are also relatively easy to use.
(defn append [lst1 lst2 k]
  (cond
   (empty? lst1) #(k lst2)
   :else (recur (rest lst1) lst2
                 (fn [rst]
                   #(k (cons (first lst1) rst))))))
Here append plainly returns a chain of functions which returns functions and they can be resolved calling trampoline on them. I pass empty as the continuation because I do not want a list of some thousand of elements printed on standard output. The other function just checks the 0s are at the right places.
user> (cps/append (range 100000) (range 4) empty)
#[cps$append$fn__5392 cps$append$fn__5392@30ea3e3c]
user> (trampoline (cps/append (range 100000) (range 4) empty))
()
user> (trampoline (cps/append (range 100000) (range 4)
                              #(and (= (nth % 0) 0)
                                    (= (nth % 100001)))))
true

Partition

Partition suffers from a similar problem than append, in the sense that a naive version blows the stack.
(defn- partition-notramp [coll p? k]
  (loop [coll coll k k]
    (cond
     (empty? coll) (k () ())
     (p? (first coll)) (recur (rest coll)
                              (fn [p-true p-false]
                                (k (cons (first coll) p-true)
                                   p-false)))
     :else (recur (rest coll)
                  (fn [p-true p-false]
                    (k p-true
                       (cons (first coll)
                             p-false)))))))
The solution is also similar. When a continuation k is called, instead return a function which calls k, like #(k ...).
(defn- partition [coll p? k]
  (loop [coll coll k k]
    (cond
     (empty? coll) #(k () ())
     (p? (first coll)) (recur (rest coll)
                              (fn [p-true p-false]
                                #(k (cons (first coll) p-true)
                                   p-false)))
     :else (recur (rest coll)
                  (fn [p-true p-false]
                    (k p-true
                       (cons (first coll)
                             p-false)))))))

Quicksort

Now things become complicated. Quicksort is a doubly recursive function and is called inside the continuations. We basically have each continuation return a function which runs what would have been the continuation weren't we using trampolines. This works like a breeze: we just have to call trampoline before returning.
(defn quicksort
  ([coll less? k]
     (letfn [(qs [coll k]
                 (if (empty? coll) #(k ())
                     (let [[pivot & coll1] coll]
                       (partition coll1
                                  (fn [x] (less? x pivot))
                                  (fn [less-than greater-than]
                                    (fn []
                                      (qs greater-than
                                          (fn [sorted-gt]
                                            (fn []
                                              (qs less-than
                                                  (fn [sorted-lt]
                                                    (append sorted-lt (cons pivot sorted-gt) k))))))))))))]
       (trampoline (qs coll k))))
     ([coll less?]
        (quicksort coll less? identity)))

A word on performace

Of course, performance is rather terrible. First, quicksort in functional settings rather sucks. I find it so much imperative oriented that it is even hard to think about it in this context (notice here we are using rather complex stuff to make it functionally looking).
Then there are a number of issues:
  1. Typical optimizations like switching to other algorithms with lower multiplicative constants is not easy to do. Every self-respecting implementation of quicksort when the list is sufficiently small switches to something like insertion sort because it is just faster (say... sublists of k elements).
  2. Other optimizations like recursing first in the smaller half of the array are also hard to do (because we are using lists... we should essentially use a custom variant of partition which takes care of such nuisances).
  3. Quicksort performs poorly if the pivot element is chosen as the first element in the list. However, in functional settings, where the core data structure is likely to (or just may be) a list, taking an element in the middle has an inacceptable algorithmic complexity
  4. Moreover, partition is likely to use far more memory than what would use in imperative settings
Consequently, Clojure sort is plainly faster, considering that it should call Java sort (which now is the blazingly fast tim-sort, a difficult if not impossible to beat algorithm). Comparisons in this sense are embarassing. That is why I did not even try to code smarter in order to make this quicksort faster. And no, I'm definitely not going to turn tim-sort in CPS+trampolines by hand.
Technorati Tags: , , , ,

Saturday, September 3, 2011

Too many cool languages (Python, Clojure, ...)

There is plenty of languages I like. I'm naming just a bunch of them here... so a non exclusive list may be:

  1. Python
  2. Common Lisp
  3. Scheme
  4. Clojure
  5. Erlang
  6. Haskell
  7. Prolog
  8. Ruby

The list is quite incomplete... for example I left out languages from the ML family. I do not have anything against them, I just think Haskell nowadays is way cooler (cooler != better) and also the Haskell platform is a definitive advantage. I did not mention languages such as Io, because although I'd really like to do something with it, it just does do not feel enough motivation to try and do it in Io instead than in one of the languages above.

Same thing for Javascript. It's not that I do not like it... it is just that if I can, I would rather use one of the languages I mentioned (which at least in case of client side JS most of the times is a no go). I also left out things like CoffeeScript and a whole lot of languages that did not even jump to my mind right now.

The list is quite representative of my tastes in programming languages. All the languages are very high level languages (I enjoy plain old C, but I tend to use it to extend the above languages and/or for problems which really need C points of strength). Most of them are declarative languages, almost every one of them has a strong functional flavor (basically just Prolog hasn't, the others are either functional languages or received a strong influence from functional programming).

Some of the languages are better for certain tasks, because they are built for them or have widely tested libraries/frameworks. And this somewhat makes it a difficult choice which one to chose. For example, every time a network service has to be written, I consider Python (Twisted/Stackless), Erlang and recently Clojure. If the system has to be highly concurrent, probably Erlang has an edge, but also Haskell (with STM and lightweight threads) and Scheme (Gambit/Termite) are strong contenders.

This is essentially about personal projects. In fact, for "serious" stuff, I am still using Python or Java (and C/C++ if more muscle is needed). This is basically because I have far more experience with those technologies. Clojure is probably a viable choice as well, as I can easily interact with Java and I have been using that quite intensively in the past two years (I also used it to extend one of the "serious" projects, though I had to remove it because I had other experimental components and it was tiresome to find out whether bugs were due to the other components or my relative inexperience with clojure). Besides, I have still to find winning debugging strategies for clojure: probably I'm missing something obvious here. I'm also trying to increase my expertise with Erlang.

These are some random thought about developing in the environments.

Editing Tools

The first issue, is to find the right tools. It is not exactly easy to discover the right tools to develop in non mainstream languages. For example, with Java any major IDE is a good enough tool. For Python, both PyCharm and WingIDE are excellent and easy to use tools. Moreover, vim is an excellent tool with only minor tweaking. Oddly enough, I'm still struggling to find a good setup for Emacs.

On the contrary, Emacs is quite easily one of the best environments for Scheme/Common Lisp/Clojure (thanks to slime), Haskell, Erlang and Prolog (thanks to the improved modes). Still, after years of switching back and forth from Emacs to vim and vice versa, I think I'm more comfortable with vim. Unfortunately, it is not always the case that customizing vim for these languages is as easy as it is with Emacs. For example, I found an easy to use Erlang plugin, but I have still to try using advanced plugins for Haskell and Clojure (I know the plugins, still I did not finish the setup). For Clojure it's just easier to fire LaClojure or Emacs+Slime. For Haskell, I'm not serious enough to know my own needs. I wrote software for my bachelor's in Prolog with vim, and I have to say that Emacs is quite superior (especially considering Sicstus custom mode).

Back in the day I used TextMate for Ruby. Right now I guess I'd probably go with RubyMine or vim.

Platform

Now this is a serious point. Many of my "difficulties" with Scheme lie here: chose one interpreter/compiler with libraries to do most things I want to do and some tool to ease the installation of other libraries (I should probably just stick with Racket). For example with Python it is easy: pick the default CPython interpreter, use pip/easy_install for additional libraties.

Common Lisp is also going well (SBCL + Quicklisp here). I was pleasantly surprised by Haskell: the Haskell platform is wonderful. Some years ago it was not completely clear which interpreter to use and external tools (like Haskell) were not completely immediate. Now with the Haskell platform you just have everything under control.

About clojure, I feel rather satisfied. For project oriented stuff there is leiningen, which works fine. My main objection is that most of the time I'm not *starting* with project oriented stuff... which is something which so much remembers me of the Java world, where first you start a project (as opposed to unix, where you just start with a file). Luckily enough in the clojure community there are also guys like me, and so there is cljr which just fills the area of "before" something becomes a project.

Cljr is very nice with emacs/swank/slime. Unfortunately I've not yet found a way to make it interact with vim.

Erlang already comes bundled with an interesting platform and rebar does lots of nice stuff too. Erlang/OTP tends to be a bit too project oriented but it quite makes sense. Basically all of the times you are setting up systems with many processes and all and it just makes sense to do it that way.

Prolog is essentially a complete mess. I advise just using SWI-Prolog and forget the rest even exists, unless you need some of the advanced stuff Sicstus offers. In this case you pay and enjoy.