Tuesday, August 30, 2011

Clojure: Unfold and Anamorphisms

I found rather unusual the apparent lack of an unfold function in Clojure. Perhaps I was not able to find it in the documentation (which I hope is the case), in this case, happy to have done a useful, although slightly dull, exercise. However, if this is not the case, here it is unfold in Clojure.

Unfold is a very powerful function which abstracts the idea of generators. I tried to make it lazy (please, feel free to break it) as we expect from clojure functions. Some examples are provided:

Here, we simply generate a list of elements from 1 to 11 (included):

(unfold #(= 0 %) identity dec 10)

The following example is more interesting: here I show that unfold is at least as powerful as iterate. It is possible to regard iterate as a simpler (and easier to use) variant of unfold.

(unfold
      (fn [x] false)
      identity
      inc
      0)

The code is equivalent to (iterate inc 0). Now the idea is that:

  1. The first argument is a predicate that receives the seed (last argument). If it is true, the function returns the empty list; otherwise returns the cons where the first cell if the second argument applied to the seed and the second cell is a recursive call to unfold. All the arguments but the last one are the same. The last argument becomes g applied to seed.
  2. The second argument is a unary function which is applied to each individual "value" which is generated. In this case we just use identity, because we want just the plain arguments. We could have used #(* %1 %1) to have the squares of the values or whatever makes sense.
  3. The third argument is the function that generates the next seed given the current one
  4. The last argument is just the seed

Unfold is extremely general. For example the classic function "tails" that returns all the non-empty sublists of a list ((1 2 3) -> ((1 2 3) (2 3) (3))) could be implemented as:

(defn unfold-tails [s]
  (unfold empty? identity rest s))
  

The standard map, becomes:

(defn unfold-map [f s]
  (unfold empty? #(f (first %)) rest s))

Now, here the implementation:

Upgrade

With this update, I present an upgraded version with the tail-g parameter. The classic srfi-1 Scheme version also has this argument. Originally, I thought that in Clojure I could use it to have different sequence types. On a second thought, it can't work because of the way Clojure manages sequences (at least, not in the sense that I intended it).

On a second thought, however, it is highly useful. Consider for example the tails function I defined above. In fact it returns all non-empty substrings of a given string. This makes sense in lots of contexts, however it is different from the definition of the Haskell function.

tails                   :: [a] -> [[a]]
tails xs                =  xs : case xs of
                                  []      -> []
                                  _ : xs' -> tails xs'
That could be roughly translated (losing laziness) as:
(defn tails [xs]
  (cons xs
        (when (seq xs)
          (tails (rest xs)))))

This function also returns the empty list. Surely, it would be possible to (cons () (unfold-tails xs)), but that would break the property that each (finite) element in the resulting sequence is longer than the following. Without the kludge, the unfold-tails function breaks the strictness property that the resulting list contains xs if xs is empty. Accordingly, tails should really be defined to include the empty list. Appending the empty list to the end of the list returned from unfold-tails would maintain both properties, but would be excessively inefficient.

On the other hand, specifying the tail-g function allows a cleaner definition

(defn tails [xs]
  (unfold empty? identity rest xs
          #(list %)))

Essentially, without tail-g it is not possible to include in the resulting sequence elements which depend from the element which makes the sequence stop.

Updates

  • Substituted (if ($1) (list) ($2)) with (when-not ($1) ($2)) after some tweets from @jneira and @hiredman_.
  • Added version with tail-g
  • Here more functions and experiments with unfold.
  • Removed HTML crap from cut and paste

Technorati Tags: , , , ,

7 comments:

Jens said...

Here is another version of unfold using for (hope I got it right):

(defn unfold [p f g s] (for [x (iterate g s) :while (not (p x))] (f x)))

Anonymous said...

I'm not sure what the advantage is over map * take-while * iterate. I would prefer the function composition since it makes the semantics explicit. and is easier to understand than both unfold and for.

Unknown said...

@Tordmor: it depends on what you mean with "advantage". In everyday programming we always use the more useful tool. That means that /in any case/ we would build functions tailored to the present need; this probably means take-while/map/& c and even higher level functions.

So I would not talk about "advantage". Unfold is a powerful conceptual tool (and sees also practical usage).

Besides, once you are familiar with its semantics it is also pretty easy to understand (and when it is not a plain instance of map/iterate/take-while may be as readable).

Eventually, when you have a language without lazy sequences (which is not the case in clojure, but in general it holds) it is more efficient too.

Unknown said...

@Jens looks right. Still I regard unfold as a "primitive" function rather than something you define in terms of other functions.

Marko Nervo said...

Yeah, this is clojure, but is there any scheme interpreter/compiler that has unfold by default? On gambit and racket I can't find them... Thanks in advance ;)

Unknown said...

It's defined in SRFI-1. So every platform which includes that module has it. Racket /has/ it.

So this is valid in Racket:
#lang scheme

(require srfi/1)

(display (unfold empty?
(lambda (xs) (let ((x (car xs))) (* x x)))
cdr
'(1 2 3 4 5 6 7)))

Chicken should also implement srfi-1. Its pretty common indeed.

About Gambit:
http://dynamo.iro.umontreal.ca/~gambit/wiki/index.php/SRFI:s

So you need to use blackhole or some other library management tool (or just provide the files yourself).

Marko Nervo said...

Thanks :)