RiK0 Tech Temple

Monday, August 30, 2010

Efficiency of the different implementations of integer->roman

In this post we explored different ways to express an algorithm in scheme, ranging from very functional to very imperative. Scheme is a hybrid language, like Python (here and here in python). The scheme implementations have been described here.

Usually we think imperative solutions are faster. About this common misconceptions, I advise reading Okasaki's "Purely functional data structures"[0][1].

I decided to time the operations. Given a single implementation, I run each function on each number from 1 to 2000 and I do that, for each value, a couple of hundred times.

I do not think that data about the difference in efficiency on specific numbers is relevant. We assume that each number is equally probable to be converted. I'm not interested in numbers greater than 2000 since thousands are dealt in the same manner by all implementations.

I sincerely expected the imperative versions to be among the fastest. I thought that racket compiler could essentially generate "C-like code". On the contrary, the functional versions are not only faster when interpreted, they also yield more efficient code.

The programs we are talking about are the ones from the last post. I recall briefly their features.

Program A
simplify-all-a uses a separate simplify-by-weight-a function. This auxiliary function uses a named let to loop.

Program B
simplify-all-b is constituted by exactly the same code than simplify-all-b. However, simplify-by-weight-b is called instead of simplify-by-weight-a. simplify-by-weight-b uses a recursive subfunction to loop.

Program C
Essentially simplify-by-weight-a is not a separate function anymore, but is defined inside simplify-all-c in a let form.

Program D
While Program A, Program B and Program C are functional in nature, Program D relies on state change. for-each is used to loop over the different weights, but n (the value yet to convert in roman) and digits (the list of roman digits) are modified with set! and are not carried over as parameters.

Program E
Program E is functional. Two nested named let perform the computation. The one with greater scope has 4 parameters: the value yet to convert, the list of digits derived from the already converted part the weight-pair to use and the remaining weight-pairs. The inner named let has only two parameters, which are the value and the digits.

Program F and G
They are imperative versions using the racket non-standard for. Program G uses the user-defined form "until", while program F uses a named let to recur in the inner loop.

Graphs

I run the programs both interpreted and compiled. As we can easily notice, Program E is fastest both when interpreted and when compiled. Imperative Program F and G are very slow: when compiled they take twice the time Program E takes. Program D (imperative) is relatively fast when interpreted. However, when compiled all the functional versions are faster.

The absolute execution time of the different programs

In order to better spot the difference I prepared a graph where the bar represent the relative times with respect to the fastest code (so, higher means worst, here).

How much each program is slower than the fastest

As Program E is faster both when compiled and when interpreted, it is the bottom line. This graphs show how the imperative versions comparatively perform worse when compiled. I suspect this means that Racket optimizer is geared towards optimizing functional code. Which is right, since scheme benefits from a very functional style (I would say it is the natural style).

Code

I slightly modified the code to fit in these scheme (pun intended). The main benchmarking function is:

(define simplify-all-versions 
  (list simplify-all-a simplify-all-b simplify-all-c
                   simplify-all-d simplify-all-e
                   simplify-all-f simplify-all-g))

(define (benchmark how-many)
  (for ([simplify-all simplify-all-versions])
    (let* ([integer->roman (integer->roman-builder simplify-all)]
          [times (build-list how-many values)]
          [numbers (build-list 2000 add1)])
      (let-values ([(results execution-time real-time gc-time)
                    (time-apply map
                                (list (lambda (n)
                                      (for-each (lambda (i)
                                                   (integer->roman n))
                                      times))
                                      numbers))])
        (printf "~a: ~a~n" simplify-all execution-time)))))

And the integer->roman has been substituted with integer->roman-builder, which takes as an argument the simplify-all variant to use and returns the integer->roman function using that simplify-all.

(define (integer->roman-builder simplify-all)
  (let ([weights (sort '((1 "I") (4 "IV") (5 "V") (9 "IX") (10 "X")
                                 (40 "XL") (50 "L") (90 "XC") (100 "C")
                                 (400 "CD") (500 "D") (900 "CM"))
                       >
                       #:key car)])
    (lambda (n)
      (let-values ([(thousands n) (quotient/remainder n 1000)])
        (foldl string-append ""
               (simplify-all weights n
                             (build-list thousands (lambda (_) "M"))))))))

References

[0] Chris Okasaki, Purely functional data structures, Cambridge University Press
[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.64.3080&rep=rep1&type=pdf

Saturday, August 28, 2010

Interesting DenyHosts...

http://think.random-stuff.org/posts/denyhosts-on-mac-os-x
http://ptone.com/dablog/2009/03/setting-up-denyhosts-to-block-ssh-attacks-on-leopard/

Friday, August 27, 2010

External Iterators for DFV and BFV in Python

In my post I started a brief discussion on internal interators and external iterators.
Here I show how a couple of external iterators can be implemented in Python using generators.
The code comes from the presentation I gave at Pycon Italia 4.

We are basically assuming a n-tree of nodes where each node has an attribute value where its value is held and an attribute children that is a list (but any iterable is ok) of its children.

def depth_first_visit(node):
    stack = [node, ]
    while stack:
        current_node = stack.pop()
        stack.extend(reversed(current_node.children))
        yield current_node.value
    
def breadth_first_visit(node):
    queue = collections.deque((node, ))
    while queue:
        current_node = queue.popleft()
        queue.extend(current_node.children)
        yield current_node.value

Thursday, August 26, 2010

Romans in scheme

I described (here and here) three different ways to write the code to represent an integer in "roman" format.
Essentially, I think the best is the one using closures. In fact, while writing the python version, I also wrote a scheme version.

Although scheme is mostly a functional language, I will use the very same example to move towards a more imperative style. In Python I made my code "more functional"; the experiment is now making the code "more imperative" in scheme.

I show here the python code as an immediate reference:

@apply
def int2roman():
    weights = { 900 : 'CM', 500 : 'D', 400 : 'CD', 100 : 'C',
                90  : 'XC', 50  : 'L', 40  : 'XL', 10  : 'X',
                9   : 'IX', 5   : 'V', 4   : 'IV', 1   : 'I'}
    sorted_weights_tuples = sorted(weights.iteritems(), reverse=True)
    def int2roman(n):
        thousands, n = divmod(n, 1000)
        roman_digits = ['M', ] * thousands

        for m, s in sorted_weights_tuples:
            while n >= m:
                roman_digits.append(s)
                n -= m

        return ''.join(roman_digits)
    return int2roman

I wrote two helper functions: the first one essentially is equivalent to the inner while loop in Python, the other one to the for loop.

(define simplify-by-weight
  (lambda (weight ch n digits)
    (letrec ([sbw 
              (lambda (n digits)
                (cond ((< n weight) (cons n digits))
                      (else 
                       (sbw (- n weight) (cons ch digits)))))])
      (sbw n digits))))

(define simplify-all
  (lambda (weights n digits)
    (cond 
      ((null? weights) digits)
      (else
       (let* ([ret (simplify-by-weight (caar weights) (cadar weights) n digits)]
              [n (car ret)]
              [digits (cdr ret)])
         (simplify-all (cdr weights) n digits))))))

The idea to make the code more imperative looking or more imperative tout court comes from the idea that 4 lines in Python became 18 lines in scheme. This is not a "python vs. scheme". The solution I proposed in Python is, in my opinion, far more easier to read and undertand because it's so much shorter and the scheme version is cluttered with iteration details/function calls where it should have been much easier. New versions will be proposed. In the meantime, here it is the main function:

(define integer->roman-a
  (let ([weights (sort '((1 "I") (4 "IV") (5 "V") (9 "IX") (10 "X")
                                 (40 "XL") (50 "L") (90 "XC") (100 "C")
                                 (400 "CD") (500 "D") (900 "CM"))
                       >
                       #:key car)])
    (lambda (n)
      (let-values ([(thousands n) (quotient/remainder n 1000)])
        (foldl string-append ""
               (simplify-all weights n(build-list thousands (lambda (_) "M")))))))
  )

Apparently this is not a closure. However, remember that in scheme let is essentially syntactical sugar over a lambda form. In fact, we could have been made the closure more apparent writing the function this way:

(define integer->roman-b
  ((lambda (weights)
     (lambda (n)
       (let-values ([(thousands n) (quotient/remainder n 1000)])
         (foldl string-append ""
           (simplify-all weights n(build-list thousands (lambda (_) "M")))))))
   (sort '((1 "I") (4 "IV") (5 "V") (9 "IX") (10 "X")
                   (40 "XL") (50 "L") (90 "XC") (100 "C")
                   (400 "CD") (500 "D") (900 "CM"))
         >
         #:key car)))

I have to confess that I like the foldl/string-append thing more than the python append/join.
This is, of course, just a matter of taste. I could have made the code simpler pushing the while logic in an iterator and then simply using list comprehensions. However, I believe the design would have been far more complicated.

In order to simplify the code of the first two functions, I will:

use more special forms (or different variants of special forms)
try to eliminate the functions as independent units
if necessary, use more forms provided by racket (most of them can be added to any scheme with macros)

Here is the first function; I often use a "sub-lambda" to iterate only on the parameters which change. Here I use a let form (remember... let is sugar over lambda). I believe code is simpler, here.

(define simplify-by-weight
  (lambda (weight ch n digits)
    (let loop ([n n] [digits digits])
      (cond ((< n weight) (cons n digits))
            (else 
             (loop (- n weight) (cons ch digits)))))))

It is simple enough to be removed:

(define simplify-all
  (lambda (weights n digits)
    (cond 
      ((null? weights) digits)
      (else
       (let* ([weight (caar weights)]
              [ch (cadar weights)]
              [ret (let loop ([n n] [digits digits])
                     (cond ((< n weight) (cons n digits))
                           (else 
                            (loop (- n weight) (cons ch digits)))))]
              [n (car ret)]
              [digits (cdr ret)])
         (simplify-all (cdr weights) n digits))))))

In fact, with let-values and values code could be simplified a lot.
Essentially here the problem is that we have two nested loops
and they loop on two different things and the value of n is logically modified.

Here we are simply using named lets to express loops:

(define simplify-all
  (lambda (weights n digits)
    (let next-weight ([n n]
                      [digits digits]
                      [weight-pair (car weights)]
                      [weights (cdr weights)])
      (let  ([weight (car weight-pair)]
             [ch (cadr weight-pair)])
        (let simplify-again ([n n] [digits digits])
          (cond
            ((< n weight)
             (cond 
               ((null? weights) digits)
               (else (next-weight n digits (car weights) (cdr weights)))))
            (else
             (simplify-again (- n weight) (cons ch digits)))))))))

Now the tentative is using the destructive state change:

(define simplify-all-e
  (lambda (weights n digits)
    (for-each 
     (lambda (weight-pair)
       (let  ([weight (car weight-pair)]
              [ch (cadr weight-pair)])
         (let simplify-again ()
           (cond
             ((< n weight) null)
             (else
              (set! n (- n weight))
              (set! digits (cons ch digits))
              (simplify-again))))))
     weights)
    digits))

The above code can be improved using the for special form and unless (which we should already have used):

(define simplify-all
  (lambda (weights n digits)
    (for ([weight-pair weights])
      (let  ([weight (car weight-pair)]
             [ch (cadr weight-pair)])
        (let simplify-again ()
          (unless (< n weight)
            (set! n (- n weight))
            (set! digits (cons ch digits))
            (simplify-again)))))
    digits))

We are still longer than the four python lines, but we are no longer more complicated. In fact, considering scheme macros, we can define a while/until pair of special forms which simplify the code greatly:

(define simplify-all-g
  (lambda (weights n digits)
    (for ([weight-pair weights])
      (let  ([weight (car weight-pair)]
             [ch (cadr weight-pair)])
        (until  (< n weight)
            (set! n (- n weight))
            (set! digits (cons ch digits)))))
    digits))

Some performance measures here.

Wednesday, August 25, 2010

Clojure Counter

One of the first programs I wrote in Clojure was a simple counter.
I am quite interested in the stateful/stateless mapping and that is the easiest possible example with refs. Refs are the simplest threadsafe state primitive.

This is a counter:

(defn  make-counter []
  (let [counter (ref 0)]
    (fn [] (dosync (alter counter inc)))))

Ok, this is a bit too schemish that it should have been. I encapsulated the actual counter in the local counter variable, which is accessible only from the function I return when one makes a counter. So this counter is basically a function which always return the next number.

In scheme that would have been something like:

(define make-counter
  (lambda ()
    (let ([counter 0])
      (lambda ()
        (let ([result counter])
          (set! counter (+ counter 1))
          result))))

The clojure variant (using dosync and alter) is thread-safe. Essentially modification to the ref is allowed only in a dosync context. Modification happens through the alter function (alter what update-function).

In order to try the code, I wrote this:

(defn print-some [get-next-function how-many]
  (dotimes [_ how-many]
    (print (.getName (Thread/currentThread)))
    (print ": ")
    (println (get-next-function))))

(let [counter (make-counter)]
  (dotimes [x 10]
    (.start (Thread.
             (fn [] (print-some counter 10))
             (Integer/toString x)))))

I particularly like it (even though it is obviously a mistake) because of the way it messes the output. In fact the three print functions (well, one is a println) are not atomic: the output is not locked. This means that the output is completely mixed.

This is good because it shows that threads are really running in parallel (well, I have an 8 virtual core machine, after all). On the other hand, the counter is always perfect: numbers can be swapped (but that is natural) but no number is repeated twice or skipped.

Tuesday, August 24, 2010

Perforce... goodby

It has been a nice partnership. But I'm so much happier with distributed version control systems.
It's just a PITA to move my huge Perforce archive... probably I will do it lazily.

Monday, August 23, 2010

And what about performance (closures and classes)?

In this post on classes and closures we entirely skipped the question of performance.
Which of the three solutions is "better"? In order to do this I wrote the following snippet and I put it at the end of the file.

if __name__ == "__main__":
    import timeit
    setup = '''import roman
numbers = range(1, 1000)
numbers += [3678, 3286, 5483, 8546, 1234, 5869, 97342]
'''
    statements = [
        '''int2roman = roman.int2roman%s
for n in numbers: int2roman(n)'''% ending
        for ending in ['_cl', '_dec', '']]
    for stmt in statements:
        print stmt, timeit.timeit(stmt, setup, number=1000)

Of course I also renamed the three functions:

roman.int2roman_dec is the version with decorators
roman.int2roman_cl is the version with callable classes
roman.int3roman is the version with closures

Notice that the roman.int2roman_dec needed an internal change as well, since it accesses the attribute through the "external" function name. Thus, it became (note line 15):

if __name__ == "__main__":
def with_weights(weights):
    def _aux(f):
        f.weights = weights
        f.sorted_weights_tuples = sorted(weights.iteritems(), reverse=True)
        return f
    return _aux

@with_weights({ 900 : 'CM', 500 : 'D', 400 : 'CD', 100 : 'C',
                90  : 'XC', 50  : 'L', 40  : 'XL', 10  : 'X',
                9   : 'IX', 5   : 'V', 4   : 'IV', 1   : 'I'})
def int2roman_dec(n):
    thousands, n = divmod(n, 1000)
    roman_digits = ['M', ] * thousands

    for m, s in int2roman_dec.sorted_weights_tuples:
        while n >= m:
            roman_digits.append(s)
            n -= m

    return ''.join(roman_digits)

The times are:

Python 2.7 (r27:82508, Jul  3 2010, 21:12:11) 
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin

% python roman.py
int2roman = roman.int2roman_cl
for n in numbers: int2roman(n) 4.19191408157
int2roman = roman.int2roman_dec
for n in numbers: int2roman(n) 3.88606595993
int2roman = roman.int2roman
for n in numbers: int2roman(n) 3.98288297653

So, the fastest is the class based. The version with closures is a little bit slower. And the slowest (though not very much slower) is the version with decorators. Since most of the code is cut and pasted, the difference in timings depend in the access of the sorted_weights_tuples list.

Please notice that essentially each run applied the function to millions of numbers.
Each test consists of repeating 1000 the transformation of each of the > 1000 elements in the number list.

The slowest version runs < 1.08 times slower than the fastest. With this numbers choosing one solution because of performance is clearly useless optimization. Design an algorithm using classes instead of closures because of performance, is premature optimization.

Sunday, August 22, 2010

Oh, my Clojure [Emacs]

Clojure is very well supported in Emacs, of course. The instructions are here.
The Emacs Automatic Package Manager is needed!
Stuff on vim and TextMate, here.

I followed them I found myself with no usable REPL. I tried both with slime (which works, though I don't like how) and without, which basically does not work.

In fact I have to admit I'm not extremely pleased with the current situation. I suppose I will stick with IntelliJ. There is another Clojure plugin besides LaClojure, and perhaps that one works better. I may also be in the phase "I like scheme better than clojure right now" and that could explain my dissatisfaction.

I'm probably to try NetBeans with its plugin (LaClojure is broken right now).

Saturday, August 21, 2010

for and for-permutation

When I'm prototyping scheme programs, I usually work in Racket. It's a nice environment and most things I need are ready. It's a battery included framework and I love that.

However, sometimes I think that when I used to play with gambit (pun intended) my scheme was better. I somewhat crafted minimal tools to solve the problem instead of fitting pre-made blocks. It's a bit on the design pattern as coarse grained way of thinking to engineering problems[0] line of thought.

Anyway, I used in my programs Racket for form too much and now I have troubles in porting them to other schemes. So I decided to implement the for form with syntax-rules.

(define-syntax for-permutation
  (syntax-rules ()
    [(_ () s1 s2 ...)
     (begin s1 s2 ...)]
    [(_ ([v1 l1] [v2 l2] ...) s1 s2 ...)
     (for-each
       (lambda (v1)
         (for-permutation ([v2 l2] ...)
           s1 s2 ...)) l1)]))

(define-syntax for
  (syntax-rules ()
    [(_ () s1 s2 ...)
     (begin s1 s2 ...)]
    [(_ ([v1 l1] [v2 l2] ...) s1 s2 ...)
     (for-each
       (lambda (v1 v2 ...)
         (begin s1 s2 ...))
       l1 l2 ...)]))

For-permutation works a bit like list comprehensions in Python:

[In]  [i+j for i in range(4) for j in range(3)]
[Out] [0, 1, 2, 1, 2, 3, 2, 3, 4, 3, 4, 5]

On the other hand for works just iterating on the sequences "together":

(for ([a '(1 2 3)] [b '(6 7 8)]) (display (+ a b)))
7911

Of course, since it depends on for-each, sequences must have the same length.

---
[0] more on this another day...

Friday, August 20, 2010

Internal vs External Iterators

In order to solve the problem of visiting each element of a collections, two solutions spring to mind: internal and external iterators. The main different is "who controls the iteration". In internal iterators the iteration is controlled by the iterator itself: the caller provides a callback function which is called on each element.
In this sense, the map function is essentially an internal iterator: iterates on each element of a collection and calls the function parameter on each element. Map is a primitive function in Python, scheme/lisp, Haskell and many other languages. All these languages have higher order functions and functions as first class citizens. In fact, this strategy could (and is) be employed with C/C++ as well.

On the other hand in External Iterators the caller controls the iteration. This is the case with Python for statement, for example.

In fact, object zealots usually regard the external iterators as more flexible, as they make it possible to compare two different structures or other similar stuff:

import itertools as it

for i, j in it.izip(left_collection, right_collection):
    if i != j:
        return False
else:
    return True

In fact, this could be rather easily done with higher order functions, as long as you have the right set. What is more difficult is to deal with control; you probably need call/cc and similar primitives for many things which are rather straightforward with external iterators.

They are the more widespread method for iteration in Java (Iterator class) and Python. On the other hand in Ruby and most functional languages it is very much more common to use internal iterators. Though nothing prevents from using internal iterators in Python, it is much more common to use external iterators.

In C++, using Boost, internal iterators become somewhat more common. This comes from a project of mine:

std::transform(items.begin(), items.end(), sizes.begin(),
                   bind(&relation::size,
                        bind(&iset::value_type::second, ::_1)));
    std::for_each(sizes.begin(), sizes.end(),
                  var(counter) += boost::lambda::_1);

Let alone the items.begin(), which [b]is[/b] an external iterator, although used in an internal iterator manner.