Sunday, August 28, 2011

On monads and a poor old (dead) greek geek

After immensely enjoying the LYHFFP book, I immediately started playing with Haskell a bit. Since lately I have been implementing sieves in different languages, I found that the Sieve of Eratosthenes was an excellent candidate. In fact, there are quite a few Haskell versions out there (and almost all of them are faster than the version I'm going to write, but this is about an enlightenment process, not an algorithm implementation).

Bits of unrequested and useless background thoroughly mixed with rantings and out of place sarcasm

I also had to reconsider some of my assumptions: I was familiar with Turner's "version" of the sieve and O'Neill's article, which was basically about Turner's version not being a real implementation of the sieve and about not everybody noticing that for like 30 years (and teaching that to student as a good algorithm). I found this story extremely amusing, in the sense that sometimes functional programmers are quite a bit too full of themselves and how cool is their language and overlook simple things.

Turner's algorithm essentially was this:

primesT = 2 : sieve [3,5..]  where
  sieve (p:xs) = p : sieve [x | x<-xs, rem x p /= 0]
And essentially here the point is that to me it does not really look like a sieve, but like a trial division algorithm. Quite amusingly I found a lengthy discussion about that here. The good part is that they rather easily (if you read Haskell as you drink water) derive optimized versions of this algorithm which have decent algorithmic performance and then take some lines in trying to convince us that the code above should not be called naive (which I do not intend as an insult, but as a mere consideration) and that should be regarded as a "specification".

Now I quite understand that declarative languages are excellent tools to write definitions of mathematical stuff (and Haskell is especially good at this), but as far as I know the context in which the one-liner was presented is that of an algorithm, not of a specification.

Essentially this unrequested bit of background is just to retaliate against the world for me not being able to come up with a really fast implementation. Basically the optimized version of Turner's algorithm is quite faster than my monadic versions. Which is fine, as my goal was to get some insight on monads, which I think I did. More on this here.

On the author's unmistakably twisted mind


So... back to our business. I struggled a lot to use Data.Array.ST to implement the sieve. Essentially the point is that I find the monadic do notation quite less clear than the pure functional notation with >>= and >>. This is probably a symptom my brain having been corrupted by maths and myself turning in a soulless robot in human form. Nevertheless, I finished the task but I was so ashamed I buries the code under piles of new git commits.

Essentially, I found mixing and matching different monads (like lists) excruciatingly confusing in do notation. Notice, that was just *my* problem, but I just had to struggle with the types, write outer helper functions to check their type and so on. Basically I was resorting to try and error and much shame will rain on me for that. So I threw away the code and started writing it simply using >>= and >>. Now everything kind of made sense. To me it looked like a dual of CPS, which is something I'm familiar with. The essential point is that the resulting code was:
  1. Hopefully correct.
  2. Written in one go, essentially easily
  3. Rather unreadable
So the resulting code is:

Essentially for (3) I decided it was time to use some do notation. Perhaps things could be improved. What I came out with is a rather slow implementation, but I am rather satisfied. I think it is extremely readable: it retains a very imperative flavor (which is good in the sense that the original algorithm was quite imperative) and I believe could be written and understood even by people not overly familiar with Haskell. It almost has a pseudo-code like quality, wasn't it for some syntactical gimmicks like "$ \idx ->".


Somewhere in the rewrite process I left out the small optimizations regarding running to sqrt(lastPrime) instead of just to lastPrime, but this is not really the point. Still... I'm quite happy because I feel I now have a better grasp of some powerful Haskell concepts.

However, I feel like Clojure macros are greatly missed. Moreover, I found really easier to understand the correspondingly "difficult" concepts of Clojure or Scheme (a part from call/cc which kind of eludes me, think I have to do some explicitly tailored exercises sooner or later).

I also feel like continuously jumping from language to language could lead to problems in the long run, in the sense that if I want to use some language in the real world I probably need more experience with that particular language, otherwise I will just stick with Python because even when sub-optimal I'm so much more skilled in the python platform than in other platforms.

No comments: