Wednesday, February 23, 2011

Is laziness tiring? (part 1)

Introduction

A part from the obvious pun, I effectively consider laziness tiring. Let me explain: I'm quite used to lazy languages, I've used Haskell for some time, and I use effectively Python lazy features extensively. Moreover, it has been a while I use clojure as well.

Laziness is an extremely powerful conceptual abstraction: it is like having an optimizing compiler tailored at not perusing your sequences, so that you can write code in a natural way. I would definitely want some form of laziness (especially sequence based laziness) in every language high level enough that it does not hurt.

Moreover, lazy sequences do not have many of the usual drawbacks of default lazyness. First, you somehow have to explicitly require it; when not required is there but it does not hurt either (e.g., map). It does not play extremely well with side-effects, but in clojure the language is designed to limit the scope of expressions with side effect.

However, I decided to understand more fully how lazyness works in clojure.

lazy-seq and cons

This is a simple code I wrote to test multiple functions on a given collection and see how they fare.

(ns lazier.bench
(:use clojure.core
[clojure.contrib.str-utils :only (re-sub str-join)]
clojure.contrib.pprint))
(defn slow-computation [& more]
(Thread/sleep 500)
(if (seq more) more 1))
(defmacro benchmark
[functions lst]
(println ";" (str-join "" (repeat 72 \=)))
(pprint `~lst)
(letfn [(make-fragment [f] `(~f ~lst))]
`(do
~@(loop [functions functions timers []]
(if (seq functions)
(recur (rest functions)
(cons `(do
(print ";" (quote ~(first functions)) ": ")
(time ~(make-fragment (first functions)))
nil)
timers))
timers))
(println))))
view raw bench.clj hosted with ❤ by GitHub

Essentially benchmark is a macro taking a vector of functions. It builds code fragments which call the desired function on the list and prints the time needed. We use a slow-computation function which sleeps for 500 ms. This way is easy to count calls to slow-computation.

One of the more interesting examples I found is this one:

(benchmark [identity empty? list? seq? first rest next seq]
(lazy-seq (cons (slow-computation)
(lazy-seq (cons (slow-computation)
(lazy-seq (list (slow-computation))))))))
; ========================================================================
(lazy-seq
(cons
(slow-computation)
(lazy-seq
(cons (slow-computation) (lazy-seq (list (slow-computation)))))))
; seq : "Elapsed time: 500.609 msecs"
; next : "Elapsed time: 1000.94 msecs"
; rest : "Elapsed time: 500.536 msecs"
; first : "Elapsed time: 500.48 msecs"
; seq? : "Elapsed time: 0.177 msecs"
; list? : "Elapsed time: 0.169 msecs"
; empty? : "Elapsed time: 500.566 msecs"
; identity : "Elapsed time: 0.154 msecs"
view raw lazy-triple.clj hosted with ❤ by GitHub

Here the list is a completely lazy list of three elements. The results are expected: list?, seq?, identity do not do anything with their argument, which is not evaluated (by the way list? is false, seq? is true). first has to evaluate the first element (which is expected), empty evaluates the first element to see if there is such an element. seq should return nil for empty sequences, and as a consequence has to evaluate the first element.

rest also evaluates the first element, but not the successive. I'm not sure if it would be possible to return the cdr of the list without evaluating the car; probably not with clojure lazyness model. next is worse: it evaluates the first element (like rest), but then calls seq on the result, and that leads to the execution of another element.

lazy-seq and list

This example is slightly different, but is trivial, indeed. There is just one lazy seq: the contained list has to be evaluated altogether for every function that needs at least one element.

(benchmark [identity empty? list? seq? first rest next seq]
(lazy-seq (list (slow-computation)
(slow-computation))))
; ========================================================================
(lazy-seq (list (slow-computation) (slow-computation)))
; seq : "Elapsed time: 1000.622 msecs"
; next : "Elapsed time: 1000.278 msecs"
; rest : "Elapsed time: 1000.276 msecs"
; first : "Elapsed time: 1000.279 msecs"
; seq? : "Elapsed time: 0.474 msecs"
; list? : "Elapsed time: 0.157 msecs"
; empty? : "Elapsed time: 1000.311 msecs"
; identity : "Elapsed time: 0.153 msecs"
view raw list.clj hosted with ❤ by GitHub

lazy-seq and concat

The following example, started to surprise me.

(list (slow-computation))
(list (slow-computation) (slow-computation)))
; seq : "Elapsed time: 1500.213 msecs"
; next : "Elapsed time: 1500.26 msecs"
; rest : "Elapsed time: 1500.362 msecs"
; first : "Elapsed time: 1500.136 msecs"
; seq? : "Elapsed time: 1500.369 msecs"
; list? : "Elapsed time: 1500.233 msecs"
; empty? : "Elapsed time: 1500.242 msecs"
; identity : "Elapsed time: 1500.169 msecs"
view raw concat.clj hosted with ❤ by GitHub

As far as I could tell, concat should return a lazy sequence over the lists. It is probably my misunderstanding, but I thought that identity, seq, first, list? and empty? should evaluate one or no elements. I decided to write more benchmark functions.

(defmacro benchmark*
[functions lst]
(println ";" (str-join "" (repeat 72 \=)))
(pprint `~lst)
(letfn [(make-fragment [f] `(let [m# ~lst] (~f m#)))]
`(do
~@(loop [functions functions timers []]
(if (seq functions)
(recur (rest functions)
(cons `(do
(print ";" (quote ~(first functions)) ": ")
(time ~(make-fragment (first functions)))
nil)
timers))
timers))
(println))))
(defmacro benchmark**
[functions lst]
(println ";" (str-join "" (repeat 72 \=)))
(pprint `~lst)
(letfn [(make-fragment [f] `(let [m# ~lst] (time (~f m#))))]
`(do
~@(loop [functions functions timers []]
(if (seq functions)
(recur (rest functions)
(cons `(do
(print ";" (quote ~(first functions)) ": ")
~(make-fragment (first functions))
nil)
timers))
timers))
(println))))

The results are here:

(benchmark** [identity empty? list? seq? first rest next seq]
(concat (list (slow-computation))
(list (slow-computation) (slow-computation))))
; ========================================================================
(concat
(list (slow-computation))
(list (slow-computation) (slow-computation)))
; seq : "Elapsed time: 0.019 msecs"
; next : "Elapsed time: 0.028 msecs"
; rest : "Elapsed time: 0.02 msecs"
; first : "Elapsed time: 0.024 msecs"
; seq? : "Elapsed time: 0.012 msecs"
; list? : "Elapsed time: 0.02 msecs"
; empty? : "Elapsed time: 0.025 msecs"
; identity : "Elapsed time: 0.011 msecs"
nil
lazier.core> (benchmark* [identity empty? list? seq? first rest next seq]
(concat (list (slow-computation)) (list (slow-computation) (slow-computation))))
; ========================================================================
(concat
(list (slow-computation))
(list (slow-computation) (slow-computation)))
; seq : "Elapsed time: 1500.205 msecs"
; next : "Elapsed time: 1500.356 msecs"
; rest : "Elapsed time: 1500.212 msecs"
; first : "Elapsed time: 1500.307 msecs"
; seq? : "Elapsed time: 1500.311 msecs"
; list? : "Elapsed time: 1500.281 msecs"
; empty? : "Elapsed time: 1500.171 msecs"
; identity : "Elapsed time: 1500.275 msecs"

However, when running the first example the perceived time was akin to the second: simply the evaluation was not timed because it was done (at least) when binding in the let form. Then I got the illumination. Laziness is a bitch: the point here is that it is list that is not lazy. By the way, why should it be. So it does not matter what I do: the time is already spent when I get into concat. I understand that this is trivial: however, I find strange having tp reason about "when things happen" the point of laziness being "not reasoning about when things happen".

Anyway, using lazy-seq, things work as expected:

(benchmark [identity empty? list? seq? first rest next seq]
(concat (lazy-seq (list (slow-computation)))
(lazy-seq (list (slow-computation)
(slow-computation)))))
; ========================================================================
(concat
(lazy-seq (list (slow-computation)))
(lazy-seq (list (slow-computation) (slow-computation))))
; seq : "Elapsed time: 500.431 msecs"
; next : "Elapsed time: 1500.505 msecs"
; rest : "Elapsed time: 500.563 msecs"
; first : "Elapsed time: 500.381 msecs"
; seq? : "Elapsed time: 0.291 msecs"
; list? : "Elapsed time: 0.286 msecs"
; empty? : "Elapsed time: 500.462 msecs"
; identity : "Elapsed time: 0.301 msecs"

But what about a "warn on eager", which alerts when something that should be lazy is instead eager?

Technorati Tags:, ,


No comments: