Tuesday, August 30, 2011

Clojure: Unfold and Anamorphisms

I found rather unusual the apparent lack of an unfold function in Clojure. Perhaps I was not able to find it in the documentation (which I hope is the case), in this case, happy to have done a useful, although slightly dull, exercise. However, if this is not the case, here it is unfold in Clojure.

Unfold is a very powerful function which abstracts the idea of generators. I tried to make it lazy (please, feel free to break it) as we expect from clojure functions. Some examples are provided:

Here, we simply generate a list of elements from 1 to 11 (included):

(unfold #(= 0 %) identity dec 10)

The following example is more interesting: here I show that unfold is at least as powerful as iterate. It is possible to regard iterate as a simpler (and easier to use) variant of unfold.

(unfold
      (fn [x] false)
      identity
      inc
      0)

The code is equivalent to (iterate inc 0). Now the idea is that:

  1. The first argument is a predicate that receives the seed (last argument). If it is true, the function returns the empty list; otherwise returns the cons where the first cell if the second argument applied to the seed and the second cell is a recursive call to unfold. All the arguments but the last one are the same. The last argument becomes g applied to seed.
  2. The second argument is a unary function which is applied to each individual "value" which is generated. In this case we just use identity, because we want just the plain arguments. We could have used #(* %1 %1) to have the squares of the values or whatever makes sense.
  3. The third argument is the function that generates the next seed given the current one
  4. The last argument is just the seed

Unfold is extremely general. For example the classic function "tails" that returns all the non-empty sublists of a list ((1 2 3) -> ((1 2 3) (2 3) (3))) could be implemented as:

(defn unfold-tails [s]
  (unfold empty? identity rest s))
  

The standard map, becomes:

(defn unfold-map [f s]
  (unfold empty? #(f (first %)) rest s))

Now, here the implementation:

Upgrade

With this update, I present an upgraded version with the tail-g parameter. The classic srfi-1 Scheme version also has this argument. Originally, I thought that in Clojure I could use it to have different sequence types. On a second thought, it can't work because of the way Clojure manages sequences (at least, not in the sense that I intended it).

On a second thought, however, it is highly useful. Consider for example the tails function I defined above. In fact it returns all non-empty substrings of a given string. This makes sense in lots of contexts, however it is different from the definition of the Haskell function.

tails                   :: [a] -> [[a]]
tails xs                =  xs : case xs of
                                  []      -> []
                                  _ : xs' -> tails xs'
That could be roughly translated (losing laziness) as:
(defn tails [xs]
  (cons xs
        (when (seq xs)
          (tails (rest xs)))))

This function also returns the empty list. Surely, it would be possible to (cons () (unfold-tails xs)), but that would break the property that each (finite) element in the resulting sequence is longer than the following. Without the kludge, the unfold-tails function breaks the strictness property that the resulting list contains xs if xs is empty. Accordingly, tails should really be defined to include the empty list. Appending the empty list to the end of the list returned from unfold-tails would maintain both properties, but would be excessively inefficient.

On the other hand, specifying the tail-g function allows a cleaner definition

(defn tails [xs]
  (unfold empty? identity rest xs
          #(list %)))

Essentially, without tail-g it is not possible to include in the resulting sequence elements which depend from the element which makes the sequence stop.

Updates

  • Substituted (if ($1) (list) ($2)) with (when-not ($1) ($2)) after some tweets from @jneira and @hiredman_.
  • Added version with tail-g
  • Here more functions and experiments with unfold.
  • Removed HTML crap from cut and paste

Technorati Tags: , , , ,

Sunday, August 28, 2011

On monads and a poor old (dead) greek geek

After immensely enjoying the LYHFFP book, I immediately started playing with Haskell a bit. Since lately I have been implementing sieves in different languages, I found that the Sieve of Eratosthenes was an excellent candidate. In fact, there are quite a few Haskell versions out there (and almost all of them are faster than the version I'm going to write, but this is about an enlightenment process, not an algorithm implementation).

Bits of unrequested and useless background thoroughly mixed with rantings and out of place sarcasm

I also had to reconsider some of my assumptions: I was familiar with Turner's "version" of the sieve and O'Neill's article, which was basically about Turner's version not being a real implementation of the sieve and about not everybody noticing that for like 30 years (and teaching that to student as a good algorithm). I found this story extremely amusing, in the sense that sometimes functional programmers are quite a bit too full of themselves and how cool is their language and overlook simple things.

Turner's algorithm essentially was this:

primesT = 2 : sieve [3,5..]  where
  sieve (p:xs) = p : sieve [x | x<-xs, rem x p /= 0]
And essentially here the point is that to me it does not really look like a sieve, but like a trial division algorithm. Quite amusingly I found a lengthy discussion about that here. The good part is that they rather easily (if you read Haskell as you drink water) derive optimized versions of this algorithm which have decent algorithmic performance and then take some lines in trying to convince us that the code above should not be called naive (which I do not intend as an insult, but as a mere consideration) and that should be regarded as a "specification".

Now I quite understand that declarative languages are excellent tools to write definitions of mathematical stuff (and Haskell is especially good at this), but as far as I know the context in which the one-liner was presented is that of an algorithm, not of a specification.

Essentially this unrequested bit of background is just to retaliate against the world for me not being able to come up with a really fast implementation. Basically the optimized version of Turner's algorithm is quite faster than my monadic versions. Which is fine, as my goal was to get some insight on monads, which I think I did. More on this here.

On the author's unmistakably twisted mind


So... back to our business. I struggled a lot to use Data.Array.ST to implement the sieve. Essentially the point is that I find the monadic do notation quite less clear than the pure functional notation with >>= and >>. This is probably a symptom my brain having been corrupted by maths and myself turning in a soulless robot in human form. Nevertheless, I finished the task but I was so ashamed I buries the code under piles of new git commits.

Essentially, I found mixing and matching different monads (like lists) excruciatingly confusing in do notation. Notice, that was just *my* problem, but I just had to struggle with the types, write outer helper functions to check their type and so on. Basically I was resorting to try and error and much shame will rain on me for that. So I threw away the code and started writing it simply using >>= and >>. Now everything kind of made sense. To me it looked like a dual of CPS, which is something I'm familiar with. The essential point is that the resulting code was:
  1. Hopefully correct.
  2. Written in one go, essentially easily
  3. Rather unreadable
So the resulting code is:

Essentially for (3) I decided it was time to use some do notation. Perhaps things could be improved. What I came out with is a rather slow implementation, but I am rather satisfied. I think it is extremely readable: it retains a very imperative flavor (which is good in the sense that the original algorithm was quite imperative) and I believe could be written and understood even by people not overly familiar with Haskell. It almost has a pseudo-code like quality, wasn't it for some syntactical gimmicks like "$ \idx ->".


Somewhere in the rewrite process I left out the small optimizations regarding running to sqrt(lastPrime) instead of just to lastPrime, but this is not really the point. Still... I'm quite happy because I feel I now have a better grasp of some powerful Haskell concepts.

However, I feel like Clojure macros are greatly missed. Moreover, I found really easier to understand the correspondingly "difficult" concepts of Clojure or Scheme (a part from call/cc which kind of eludes me, think I have to do some explicitly tailored exercises sooner or later).

I also feel like continuously jumping from language to language could lead to problems in the long run, in the sense that if I want to use some language in the real world I probably need more experience with that particular language, otherwise I will just stick with Python because even when sub-optimal I'm so much more skilled in the python platform than in other platforms.

Saturday, August 20, 2011

Excellent Learn You a Haskell for Greater Good

This is the third book explicitly about Haskell I directly buy (a part from things such as Functional Data Structers or Pearls of Functional Algorithm Design), the others being Haskell School of Expression and Real World Haskell, plus some online tutorials and freely available books. I believe they are all excellent books, although with slightly different focus. They come from different ages as well (with HSOE being quite holder).

Back then I enjoyed a lot HSOE, but I think I missed something. Large part of the book used libraries not easily available on the Mac, for example. Moreover, the book did not use large parts of the Haskell standard library which are very useful. For that and other reasons, I did not go on working with Haskell. Real World Haskell has a very practical focus and I quite enjyoed that. Unfortunately, I still remembered too much Haskell not to "jump" the initial parts of the book (and that is usually a bad thing, because you don't get comfortable with the author style before jumping to more elaborate subjects). Moreover, the book is quite massive and I had other stuff to do back then (like graduating).

I did not even want to buy LYHFGG. Afterall, I am a bit skeptical over using Haskell for real world stuff (I prefer more pragmatic languages like Python or Clojure) and so I tried to resist the urge to buy another Haskell book (I could finish RWH, afterall). For a combination of events I did not even remember, I put the book in an amazon order. Understanding a couple of Haskell idioms could improve my clojure, I thought, and I started reading the book in no time.

The first part of the book is very well done but somewhat uninteresting. By the time I started reading it, I had forgotten most Haskell I knew and consequently I read that carefully: however, I made the mistake not to start a small project just to toy with the language. That is the reason I say "somewhat uninteresting": it is very basic, very easy and very clear. Still, I did only remind me of things I knew, without really improving my way of thinking much. Still, the writing was fun and light and I read through it quickly. I consider it a very good introduction to functional programming in Haskell and to functional programming in general and as such the tag line "a beginner guide" is well deserved.

Later in chapter 11 comes the real meat. Functors, Applicative Functors, Monoids and then Monads are presented. The order is excellent: after having learned Applicative Functors, Monads require only a small further understanding steps. Moreover, repeating all the reasoning on Maybe and lists really clarifies the thing. The examples were also in HSOE, but connecting the dots was somewhat left to the reader. This time I did not make the mistake to see monads as just a tool to implement state monad and reached a deeper insight on the subject.

About the book itself, I just love no starch press...
Technorati Tags: ,

Monday, August 8, 2011

Python, Numpy, Atkin & Erathostenes

In this post I compared some versions of the Atkin sieve in different languages. Basically it was just the same version in different languages.

Regarding Python, I had a version in basic Python and one using numpy arrays as the intermediary data structure. Essentially the execution time was the same. Somewhere in the post I commented that it was unsurprising and that I should write a version for numpy which more thoroughly used vector-operations to speed things up.

Unfortunately, it seems to me that atkin algorithm is not well suited to this kind of optimization. At least, I did not found an easy way to do it. On the other hand, Eratosthenes Sieve has an extremely natural formulation in terms of vector operations. The reason why this is way faster than using element-wise operation is that essentially it means pushing whole loops below the python level, at C/Fortan level (also reducing the creation of Python objects in favor of primitive C types).

Essentially the code is somewhat easier to read (having basic numpy/matlab understanding) that the regular version.

The times are in the order of:

% time python erat_matrix.py 10000000
(array([      2,       3,       5, ..., 9999971, 9999973, 9999991]),)
python erat_matrix.py 10000000  0.61s user 0.12s system 96% cpu 0.750 total

(that result was to compare times with those in the previous post, i.e., they are better than anything but Java and the highly optimized C code) and

% time python erat_matrix.py 100000000
(array([       2,        3,        5, ..., 99999959, 99999971, 99999989]),)
python erat_matrix.py 100000000  5.90s user 0.85s system 99% cpu 6.781 total

The result is rather surprising: I have seen the code running from 36 seconds to the result reported. Yesterday it was almost always around 10 seconds, so I decided to make "faster" versions. Today such "faster versions" are not faster, but at least are more consistent (always around 7 seconds, instead than varying so much). I have no clues on the reason of these discrepancies: I'm not seeing any exceptional load on the machine. Benchmarks lies, that's it.

To see if the code could benefit from a list of prebuilt primes I first tried with some manual unrolling

and since the results were promising:

Essentially, this version always runs in about 7 seconds on my machine, which is roughly 10 times slower than the highly optimized atkins sieve in C.

Tuesday, August 2, 2011

BBEdit 10 is out

Essentially that is the new. That and the fact that eventually they dropped the price to reasonable levels (39 $, IIRC). Unfortunately, the upgrade from older versions is also 39 $ (so I just bought a new serial).

The real question is why I keep buying BBEdit. I suppose I'm nostalgic about the old MacOS. It is a good editor, nothing to say that. I find it excellent with HTML and also has a nice PostgreSQL mode (not just SQL, it supports PosgreSQL dialect). The syntax highlighted grep searches (that is to say, regular expressions in fact, not grep) is another nice feature.

Other nice features are the possibility to customize the different modes (e.g., Python tab 4 spaces, auto-expand tab -- that is to say that tabs are not written in the source). Nothing other editors do not have, still, it is quite easy to customize. A couple of nice features is the fact that trailing whitespace stripping and trailing newline addition can be enabled. Once again, nothing vim and Emacs do not have, but easy to setup and somewhat missing in other "lesser" editors.

Among the addition of version 10, BBEdit has improved project management features, support for different color schemes (before that, a hack had to be used -- and the hack is forward compatible: color schemes for that can be used for the official feature).

While when I first tried TextMate I thought the next big thing after Emacs and vim was there, BBEdit is just a solid MacOS X editor. I like the fact that many a program is aware of BBEdit and lets me use it to edit stuff (and that is usually an improvement over the built-in editor). I will not stop using Emacs or vim for my serious editing, but BBEdit is a nice tool to have around.

Technorati Tags: , , ,

Monday, August 1, 2011

Teaching Javascript and other languages (Scheme, Python, Clojure, ...).

Recently, I started working more and more with Javascript, and it happened outside the browser with node.js.

So far so good, I love node.js and I will consider it as a strong alternative to twisted. I think that the language has some extremely interesting features which I would like to see in Java as well. I'm probably biased: after all, I'm cold towards inheritance to say the least. In fact, I'm quite used to doing encapsulation with closures in scheme, and the fact that in Javascript I can just return objects of functions solves the classical problem of having to return a single function[0].

So I was thinking about a discussion I had with some friends from the Europython (and Python Italia) organization group. Someone made a point about using Javascript as a teaching language. The idea has its merits: Javascript is everywhere. A browser and and editor is enough to get started with programming. Moreover, the language is quite nice: many interesting programming techniques can be taught in Javascript (and more easily than, say, C++).

Then I considered what I learned in "Javascript: The Good Parts". The subset of Javascript I'm using is roughly the one suggested in the book. And that is nice. I'm basically leaving out the bad parts. However, a student probably would meet those parts soon enough and probably would not have the wisdom to avoid the bad parts (some of them are "easier" in the short term and discipline is something often newbies lack).

Moreover, it would be naive to assume that Javascript would be "well" taught. Many languages are taught badly (and students are often not very receptive of good suggestions).

Somewhat, it reminds me of pre-standard C: the language itself had many nice features [as a domain specific language to manipulate von Neumann machines at rather high level], however, there were plenty of rough edges (I'm thinking about C Traps & Pitfals: nowadays, many suggestions are granted, but back then it wasn't really that common).

Scheme (Racket)

All considered, other languages need something to be installed. Racket/scheme ranks pretty high as a teaching languages. SICP is an excellent book and Racket offers an easy to install/easy to use (easy to debug/easy to *) environment. The language is small and orthogonal and has features and abstraction which are hard to find outside the Lisp world. However, some concepts need some mathematical understanding: probably it is ok to assume such maturity at university level, but not before. Moreover, everything which could be of interest (GUI, threads, web, whatever) is just bundled.

Python

Python on the other hand is easier to start with. The mathematical maturity needed is lower, the language is probably easier. The abstraction level is good and there are lots of interesting functionality in the standard library. IDLE is a decent basic IDE and it is probably easier to find open source projects to which contribute. I just love Python, so I may be biased. Moreover, Python is easy to use to script the system (both Unix and Windows) and I believe that using a language to automate common operations is a great way to increase programming skills [useful programs are just more fun]. Ruby essentially has the same merits and problems of Python.

Clojure

About Clojure, basically it depends on Clojure Box. Without it, setting up a (free) easy to use environment is not as easy (I have seen newbies finding hard to make differences between the OS specific command prompt (cmd.exe or bash) and a REPL -- in python, I'm not talking about clojure --). That is the kind of ease of use we must reach (something like racket or IDLE) and the only way to get it with clojure is clojure box. Still it is a somewhat more complex -- though more powerful -- environment. The other problem I found about teaching Clojure is that it is hard to do without some prior knowledge of Java and also the official documentation is somewhat not very newbie friendly -- lack of examples are a [pun intended] clear example of that --.

On the other hand, it has all the advantages of scheme (a part that the language is somewhat bigger) with some further merits (IMHO swing >> wx, easy stuff to use from the Java world). The dependency from Java leads me to think that Clojure is more an excellent second language to "teach programming" to people exposed only to Java (and probably to the "bad parts of Java"). In this case requiring prior knowledge of Java is not a weakness but a point of strength.

----

[0] ok, there are other solutions in scheme, still I quite like to return {function: function, functionB: functionB}