RiK0 Tech Temple: Efficiency of the different implementations of integer->roman

In this post we explored different ways to express an algorithm in scheme, ranging from very functional to very imperative. Scheme is a hybrid language, like Python (here and here in python). The scheme implementations have been described here.

Usually we think imperative solutions are faster. About this common misconceptions, I advise reading Okasaki's "Purely functional data structures"[0][1].

I decided to time the operations. Given a single implementation, I run each function on each number from 1 to 2000 and I do that, for each value, a couple of hundred times.

I do not think that data about the difference in efficiency on specific numbers is relevant. We assume that each number is equally probable to be converted. I'm not interested in numbers greater than 2000 since thousands are dealt in the same manner by all implementations.

I sincerely expected the imperative versions to be among the fastest. I thought that racket compiler could essentially generate "C-like code". On the contrary, the functional versions are not only faster when interpreted, they also yield more efficient code.

The programs we are talking about are the ones from the last post. I recall briefly their features.

Program A
simplify-all-a uses a separate simplify-by-weight-a function. This auxiliary function uses a named let to loop.

Program B
simplify-all-b is constituted by exactly the same code than simplify-all-b. However, simplify-by-weight-b is called instead of simplify-by-weight-a. simplify-by-weight-b uses a recursive subfunction to loop.

Program C
Essentially simplify-by-weight-a is not a separate function anymore, but is defined inside simplify-all-c in a let form.

Program D
While Program A, Program B and Program C are functional in nature, Program D relies on state change. for-each is used to loop over the different weights, but n (the value yet to convert in roman) and digits (the list of roman digits) are modified with set! and are not carried over as parameters.

Program E
Program E is functional. Two nested named let perform the computation. The one with greater scope has 4 parameters: the value yet to convert, the list of digits derived from the already converted part the weight-pair to use and the remaining weight-pairs. The inner named let has only two parameters, which are the value and the digits.

Program F and G
They are imperative versions using the racket non-standard for. Program G uses the user-defined form "until", while program F uses a named let to recur in the inner loop.

Graphs

I run the programs both interpreted and compiled. As we can easily notice, Program E is fastest both when interpreted and when compiled. Imperative Program F and G are very slow: when compiled they take twice the time Program E takes. Program D (imperative) is relatively fast when interpreted. However, when compiled all the functional versions are faster.

The absolute execution time of the different programs

In order to better spot the difference I prepared a graph where the bar represent the relative times with respect to the fastest code (so, higher means worst, here).

How much each program is slower than the fastest

As Program E is faster both when compiled and when interpreted, it is the bottom line. This graphs show how the imperative versions comparatively perform worse when compiled. I suspect this means that Racket optimizer is geared towards optimizing functional code. Which is right, since scheme benefits from a very functional style (I would say it is the natural style).

Code

I slightly modified the code to fit in these scheme (pun intended). The main benchmarking function is:

(define simplify-all-versions 
  (list simplify-all-a simplify-all-b simplify-all-c
                   simplify-all-d simplify-all-e
                   simplify-all-f simplify-all-g))

(define (benchmark how-many)
  (for ([simplify-all simplify-all-versions])
    (let* ([integer->roman (integer->roman-builder simplify-all)]
          [times (build-list how-many values)]
          [numbers (build-list 2000 add1)])
      (let-values ([(results execution-time real-time gc-time)
                    (time-apply map
                                (list (lambda (n)
                                      (for-each (lambda (i)
                                                   (integer->roman n))
                                      times))
                                      numbers))])
        (printf "~a: ~a~n" simplify-all execution-time)))))

And the integer->roman has been substituted with integer->roman-builder, which takes as an argument the simplify-all variant to use and returns the integer->roman function using that simplify-all.

(define (integer->roman-builder simplify-all)
  (let ([weights (sort '((1 "I") (4 "IV") (5 "V") (9 "IX") (10 "X")
                                 (40 "XL") (50 "L") (90 "XC") (100 "C")
                                 (400 "CD") (500 "D") (900 "CM"))
                       >
                       #:key car)])
    (lambda (n)
      (let-values ([(thousands n) (quotient/remainder n 1000)])
        (foldl string-append ""
               (simplify-all weights n
                             (build-list thousands (lambda (_) "M"))))))))

References

[0] Chris Okasaki, Purely functional data structures, Cambridge University Press
[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.64.3080&rep=rep1&type=pdf

RiK0 Tech Temple

Monday, August 30, 2010

Efficiency of the different implementations of integer->roman

Graphs

Code

References

No comments: