Sunday, September 5, 2010

Multi-scheme implementation of roman: performance

In order to confront the different implementations of simplify-all with multiple scheme variants, I removed all the racket specific code. I implemented by hand missing functions (foldl) or got their implementations otherwise (sort, from Freeley in the gambit lib directory). I also have some measures of different implementations all in Racket.

The resulting program still runs with racket, of course. In fact, I found rather annoying to perform the test as, for example, some implementaion provide an add1 function (so I have to comment out the definition) while some others miss that builtin (and they need my implementation).

Benchmarking code has become:
(define benchmark
  (let ([numbers (build-list 2000 add1)])
    (lambda ()
      (for-each
       (lambda (simplify-all)
         (let* ([integer->roman (integer->roman-builder simplify-all)])
           (time (map integer->roman numbers))))
    simplify-all-versions))))

We tried the code with Larceny (about good names...), Gambit, Racket and Chicken.

Time executions of the programs with different scheme implementations
Essentially on this benchmark  Chicken Scheme is the fastest. I compiled the sources with the following options:
csc -inline -block -local -unsafe -fixnum-arithmetic -lambda-lift -inline -disable-interrupts -disable-stack-overflow-checks

The differences in performance among the different programs are negligible. Moreover, with different runs, the relative differences tended to change.

Compiled racket is just a bit slower. Here we can spot that the "imperative" variant is significantly slower. Its slowness is amplified in the non compiled variant. On the other hand, with gambit, it seems that the imperative variant is the fastest. Unfortunately the time delta is so small that it could be a statistical variation: besides, differences are so small they are negligible.

Larceny is the slowest. I suppose I was unable to use the proper runtime options.

I also put in a graph the times spent in garbage collection.

Time spent garbage collecting by the different scheme implementations
Here we notice that most implementations spend comparatively the same time garbage collecting all the different programs. Larceny is the one spending less time (which is surprising, given the slow execution times). Compiled Racket comes second and Chicken comes third.

I believe I missed some compilation optimization with gambit, since to my knowledge it is one of the fastest scheme implementations.

I find the graph for Interpreted Racket the most interesting. In fact, we can see that execution times peaks   occur where garbage collection times peaks occur. I put in a graph the difference between the running time and the time spent garbage collecting. This means that it is the time spent in actual computations.

Running time without considering GC
Now that we removed time spent garbage collecting, Program A is on par with other programs with Racket Interpreted. Still, the "imperative version" (program D) remains slower. Nothing really changes for compiled Racket.



4 comments:

Anonymous said...

Hi Enrico,
about gambit's options, see:

http://dynamo.iro.umontreal.ca/~gambit/wiki/index.php?title=Documentation:Special_form_declare

A good starting point could be adding:

(declare
(standard-bindings)
(extended-bindings)
(block)
(not safe)
(fixnum))

on your source files.
Also don't forget runtime options:
http://www.iro.umontreal.ca/~gambit/doc/gambit-c.html#Runtime-options

I'd like to reproduce your benchmarks, there is a place where I can get the whole code? Thank you.

--marco

Unknown said...

Basically I think everything is here:
http://github.com/rik0/rk-exempla/tree/master/algorithms/scheme/

Unfortunately I was quite in a hurry and I did not create different scheme files for different scheme implemetation. Though that one is "quite" cross-implementation. You probably have to (un)comment some functions depending on the fact that they are already provided in the standard library.

I will try the gambit benchmarks again considering your suggestions soon. Have you some suggestion on particular runtime options?

BTW, you are from geek-ml, aren't you?

Anonymous said...

Hi,

thank you for code, I think I've found a couple of bugs.

foldl should be:

;(define foldl
; (lambda (f z lst)
; (let loop ((acc z) (lst lst))
; (cond
; ((null? lst) acc)
; (else
; (loop (f (car lst) acc) (cdr lst)))))))

and the function passed to 'sort' in 'integer->roman-builder' should use '>' and not '<'.

I've fixed these bugs and put in some trick to make the code more gambit friendly and to check results.

1) shebang

On unix (untested)
#!/usr/bin/env gsi-script

On windows:
@;gsi-script "%~f0" %*

2) Declarations as in my previous post.

3) Renamed 'benchmark' to 'main', so gsi-script can call it directly (see gambit's documentation)

4) Write outputs to file, and invoke garbace collection after each function benchmarked:

;(define (cout . args) (for-each display args))
;(define ln #\newline)

;(define main
; (let ((numbers (build-list 2000 add1)))
; (lambda ()
; (with-output-to-file "out.txt"
; (lambda ()
; (for-each
; (lambda (simplify-all)
; (cout simplify-all ln)
; (let* ((integer->roman (integer->roman-builder simplify-all)))
; (cout (time (map integer->roman numbers)) ln))
; (##gc))
; simplify-all-versions))))))

Compiling with:

gsc -f roman.scm

and running with:

gsi -:m1000 -f roman

I've got the following results:

;(time (map integer->roman numbers))
; 7 ms real time
; 0 ms cpu time (0 user, 0 system)
; 2 collections accounting for 1 ms real time (0 user, 0 system)
; 2550944 bytes allocated
; no minor faults
; no major faults
;(time (map integer->roman numbers))
; 6 ms real time
; 0 ms cpu time (0 user, 0 system)
; 2 collections accounting for 1 ms real time (0 user, 0 system)
; 2551112 bytes allocated
; no minor faults
; no major faults
;(time (map integer->roman numbers))
; 6 ms real time
; 16 ms cpu time (16 user, 0 system)
; 2 collections accounting for 1 ms real time (0 user, 0 system)
; 2551112 bytes allocated
; no minor faults
; no major faults
;(time (map integer->roman numbers))
; 7 ms real time
; 0 ms cpu time (0 user, 0 system)
; 2 collections accounting for 0 ms real time (0 user, 0 system)
; 2038800 bytes allocated
; no minor faults
; no major faults
;(time (map integer->roman numbers))
; 5 ms real time
; 0 ms cpu time (0 user, 0 system)
; 2 collections accounting for 0 ms real time (0 user, 0 system)
; 1974872 bytes allocated
; no minor faults
; no major faults

for comparison, with your racket version untouched:

#: 1061
#: 1107
#: 1046
#: 1185
#: 967
#: 1670
#: 1731

PS: yes, I'm from geek-ml

--marco

Unknown said...

Ok. Thank you very much.
BTW, I believe at this point I have to redo *all* the tests. Seems like I've got some work to do in the weekend. Well, a part from *actual* work.