Sunday, December 24, 2006

Citation: Concurrency in Java (Van Roy and Haridi)

Concurrency in Java is complex to use and expensive in computational resources. Because of these difficulties, Java-taught programmers conclude that concurrency is a fundamentally complex and expensive concept. Program specifications are designed around the difficulties, often in a contorted way. Biut these difficulties are not fundamental at all. There are forms of concurrency that are quite useful and yet as easy to program as sequential programs (e.g., stream programming as exemplified by Unix pipes). Furthermore, it is possible to implement threads, the basic unit of concurrency, almost as cheaply as procedure calls. If the programmer were taught about concurrency in the correct way, then he or she would be able to specify for and program in system without concurrency restrictions (including improved versions of Java). (Concepts, Techniques and Models of Computer Programming, Peter Van Roy and Seif Haridi)

First of all, I completely agree with the authors. I like learning new languages/paradigms/models and find quite restricting sticking to just One Language, One Platform, One Model. I'm fond of Python (but indeed Python is an excellent imperative/object oriented language that allows functional programming -- and logic programming is experimented among other things in pypy). I also like Ruby (again Imperative with functional capabilities). I recently discovered the power of Prolog, and I love Haskell (although I never wrote any substantial piece of software in Haskell).

I can't imagine working with Java and only with Java (and the same applies to 'C++ and only C++' or whatever). However, there is plenty of programmers out there that chose One Language, One Platform, One Model and stick with it. I'm wondering if we can generalize Dijkstra statement

The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offence.

In fact it seems that Java has the very same problem. A lot of Java programmers are only able to formulate problems (and solve them) thinking in Java. One may argue that maybe business programmers already had their mind crippled, but I don't think this is the case.

I'm afraid to say the problem is not limited to Java, of course (it's just more evident because there a lot of Java programmers). There are a lot of excellent C++ (for example) programmers that simply do not believe that large programs written in dynamically typed languages can work (and of course they are wrong, since we can exhibit such programs). On the other hand some Python/Ruby/etc advocates think that static typing is fundamentally bad. In fact it is not: Java type-system is poorly designed. It's not that static typing is wrong or clumsy. Haskell and OCaml are statically typed, but types do not get in the way: they are a concrete help for the programmer (but this does not imply that a language should be statically typed).

My opinion is that more programming languages, models, paradigms should be taught to programmers. For two main reasons: first, they know more ways to solve a problem and hopefully they will be able to choose the best one; second, they will be able to adapt concepts coming from other models/paradigms/languages to the language they are currently using, thus making the design more elegant.

Elegance is not a dispensable luxury but a quality that decides between success and failure. (Dijkstra)

Saturday, December 23, 2006

Books and information technology.

This article won't be very interesting for people living in an English speaking country: most technology books are written in (or at least translated to) English. However, things change if you do live in a country where English is not the main language (like Italy) and most people can't read English comfortably (again, like Italy).

A lot of people working in the IT field here find it more difficult to read in English than in Italian, and they choose books translated in Italian, if they are available.

I prefer English books: they usually cost less and there are no translation errors (by definition). Moreover, there are less errors in general, since more people read the book (and thus has the opportunity to find and report errors). Famous hackers and scientists from all the world usually speak English (and deliver lectures in English): they also tend to write books in English (even if it is not their main language).

Moreover, there are subjects where the translated books (for example Python books in Italy) are *not* the 'best' books on the subject. For example in Italy you can buy 'Programmare in Python' (that is the translation of Learning Python) or the *very* old 'Python 2.1 Tutto ed oltre' (that refers to Python 2.1 and is still being sold). Python in a Nutshell (whose author *is* Italian) is not available, and the list of 'missing' titles is much longer.

In a country like Italy, availability of books on a particular technology can be one of the main reasons why that technology is adopted. In fact a lot of Microsoft books (or books on Microsoft products) are translated (or even written from scratch in Italian by Italians). Another language that fills whole bookcases in our bookshops is Java. PHP has a lot of beginners books, too.

In my opinion none of the aforementioned programming environments excels. Java has a beautiful library (although terribly over-engineered) and a very promising VM, but the language itself sucks. PHP has a horrible library, a terrible Virtual Machine and the language itself sucks even more. .Net has a wonderful VM (but unfortunately not cross-platform -- a part from Mono), but C# is not better than Java, and VB... luckily enough you can run a recent Python version in the .Net environment, but again, you don't find a lot (if any) of information about this in Italian books.

Every time a new programmer goes to a bookshop to buy a book to start learning programming, he is very likely to find a book on Java/MS/PHP. And until he 'grows' up, and learns to look for pieces of information in other places, he will be bound to those technologies. In fact she could have learned sooner (and having more fun) if she discovered Rails (for Web development) or Python. There *are* some books on those subjects (and a whole free book -- but in English -- on Django (a Python Rails-like framework), however, they are a few, and someone who starts looks for 'big names'.

Friday, December 22, 2006

.Xmodmap hex digits

I's a couple of day I'm hacking with xmodmap and keycodes to make my keyboard behave as I expect.

Recently it sprang to my mind that with SuSE 7.1 I used a utility called xkeycaps.

Unfortunately enough xkeycaps generates an .Xmodmap with keycodes expressed as hex numbers, while xev shows code as decimal numbers

This simple ruby oneliner converts an .Xmodmap file with hex codes to one with decimal codes

cat .Xmodmap.hex |
ruby -n -e "if $_ =~ /keycode\s*(0x[A-F0-9][A-F0-9])\s*=\s*(.+?)\n/; puts \"keycode #{$1.hex} = #{$2} \" else puts $_ end">
.Xmodmap.dec
provided you called the hex .Xmodmap '.Xmodmap.hex'.

Tuesday, December 19, 2006

A Prolog XML printer in less than 100 lines.

Suppose we have an XML parser that converts an XML file into a Prolog term.
We may (quite) easily build one with DCG. I will be doing one soon, for values of 'soon' that depend on the status of my thesis.
We describe the way the prolog term is built:
Text is simply a prolog term. In a less naive implementation we may want to use the string datatype. However, to show the power of a declarative approach, things would not change a lot. Moreover, if we are using swi-prolog (a widespread open source implementation) internally itis able to use UCS (that is a superset of 16 bit Unicode). We only have to make the parser smart enough to convert entities in the right Unicode character and the whole thing will behave correctly
Attributes are represented as key:value terms, where : is a binary infix functor. Again, this is quite a natural representation.
A tag is represented like an tag(Attributes, Children) term, where term is the name of the tag, andl Attributes and Children are (possibly empty) lists of attributes and 'children'. This representation is not very clever. In fact you can have more optimized programs representing tags like element(tag, Attributes, Children). That is the way things are done in SWI-Prolog official SGML/XML parser library, but this is a simple example.
A 'child' is either a tag or a text section.
This (informal) description can be translated in a straightforward manner in a prolog program. The last thing before presenting the source is an example term representing a very simple XHTML file.

html([], [head([], ['']), body([], [p([], ['ciao ciao', img([src:'ciao.jpg'], [])]) ])])

And now the (81 lines long) prolog source code. You can call it with pp_tag(Term), where Term is the term representing the XML file.

pp_tag(Tag) :-
        !,
        pp_tag(Tag, 0).
pp_tag(Tag, N) :-
        Tag =.. [Tag_Name, Attributes, []],
        !,
        s(N),
        write('<'),
        write(Tag_Name),
        pp_attributes(Attributes),
        write(' />').
pp_tag(Tag, N) :-
        Tag =.. [Tag_Name, Attributes, Children],
        !,
        s(N),
        write('<'),
        write(Tag_Name),
        pp_attributes(Attributes),
        write('>'),
        N1 is N+1,
        pp_children(Children, N1),
        nl,
        tab(N),
        write('
        write(Tag_Name),
        write('>').
pp_text(Text, N) :-
        name(Text, Chars),
        pp_lines(Chars, N).
pp_line(Line, N) :-
        name(Atom_Line, Line),
        s(N),
        write(Atom_Line).
pp_lines(Lines, N) :-
        ((append(Line, [10|Rest], Lines), !),
            pp_line(Line, N),
            pp_lines(Rest, N)
        ;
            Line = Lines,
            pp_line(Line, N)
        ).
pp_children([], _N) :-
        !.
pp_children([X|Xs], N) :-
        !,
        pp_child(X, N),
        pp_children(Xs, N).
pp_child(Child, N) :-
        pp_tag(Child, N)
        ;
        pp_text(Child, N).
pp_attribute(Name:Value) :-
        !,
        write(Name),
        write('='),
        pp_quoted(Value).
pp_attributes([]) :-
        !.
pp_attributes([X|Xs]) :-
        write(' '),
        pp_attribute(X),
        pp_attributes(Xs).
pp_quoted(Term) :-
        !,
        write('"'),
        write(Term),
        write('"').
s(0) :-
        !.
s(N) :-
        nl,
        tab(N).

Of course this is a 'toy' implementation. You find the real thing here.

Monday, December 18, 2006

Windowmaker

How much I love windowmaker...

Now it is much less used than once... Most people do use KDE or GNOME (or XFCE) and the good old Window Managers are kind of forgotten (or maybe simply unknown to the masses). However, when I started KDE was slow and took every single byte of RAM from my machine. GNOME was somewhat lighter. However, the desktop environment was not polished and practical (it wasn't even the 2.0).

In fact some window managers/desktop environment had higher usability (at least in my opinion) even though you had to renounce to a good file-manager (and back than gmc/nautilus was not as good as today).One of those is WindowMaker.

Simple and powerful. As it is today. Well... I restored it on my Ubuntu. Without the debian menu you have to build the application menu yourself (and that is a pain in the ass). However, since most of the times I'm only using firefox, emacs and a terminal, I managed to do it in a very short time.

Saturday, December 16, 2006

Compilers: Principles, Techniques & Tool (Dragonbook)

A couple of days ago it arrived my own copy of "Compilers: Principles, Techniques & Tool" by Aho, Lam, Sethi and Ullman, better known as the "Dragonbook" (look at the cover to understand why). This weekend I am starting reading the 992 page long book. Actually I find it well written and very clear, even though I'm at the beginning of the book.

I suspect things will get more complicated while proceeding further. However, I find the subject dramatically interesting. I hope to find enough time to read it quickly and eventually start implementing something. Some ideas contained in the book I suppose will be of great help for my thesis (in fact I'm implementing something that is closely related to a compiler).

Sunday, November 19, 2006

Adding scrolling to Emacs...

Easy...
;; mouse-wheel: scroll
(global-set-key [mouse-4] 'scroll-down)
(global-set-key [mouse-5] 'scroll-up)
I found a lot of posts about this, but none worked for me. I just took some pieces of information here and there and I came up with this (that works for me).
Emacs 21.4 on Ubuntu PPC.

Wednesday, November 15, 2006

Servoy Hu? 2

Today, while checking my gmail account I read an ad about Servoy being an alternative to Ruby and Rails

Actually, I wanted to check out what was that. I thought about some crappy PHP framework that pretended to do what Rails just cloning its structure, but without Ruby power

In the homepage the first thing you learn about Servoy is that Ruby syntax is complex and that Rails has limited capabilities. Further analisys makes me think it's a Java based environment scripted in Javascript. Basically they are asserting that Javascript syntax is clearer than Ruby one (which is debatable) and that a RAD like thing built on Java is less limited than Rails. Note that Servoy is not Java: it is something built on Java (so I don't expect it to be as powerful as Java, or it would be as complex as well).I'm not against RAD or 'easy' environments. I just find ridiculous when they say that Rails is limited (that is false) and that a closed component system is more complete.

Moreover, I don't know anyone using this Servoy. I haven't read about it on tech magazines, my fellow developers haven't heard about it either. My opinion is that it's a small environment tool that uses Rails popularity (as a comparison) to convince people to pay them for something they could have done easier in Rails.

The same conclusions have been made by this blogger

Difference Lists for visiting trees (Prolog)

This is are some (simple) exercises in Prolog using difference lists.They should be correct, but I have to say I'm a beginner Prolog programmer... so...

inorder_dl(tree(X, L, R), Xs-Ys) :- 
        inorder_dl(L, Xs-[X|Zs]), 
        inorder_dl(R, Zs-Ys). 
inorder_dl(void, Xs-Xs). 
preorder_dl(tree(X, L, R), [X|Xs]-Ys) :- 
        preorder_dl(L, Xs-Zs), 
        preorder_dl(R, Zs-Ys). 
preorder_dl(void, Xs-Xs). 
postorder_dl(tree(X, L, R), Xs-Zs) :- 
        postorder_dl(R, Ys-[X|Zs]), 
        postorder_dl(L, Xs-Ys). 
postorder_dl(void, Xs-Xs). 

Sunday, November 12, 2006

Difference lists

Well.... I just discovered this new wonderful world. I really like them a lot.

I suspect I'm getting a little bit more into Prolog programming. However, I still find more exciting Haskell lazy programming. But I have to admit that my scarce love for Prolog it's more a problem of (my) ignorance.

A couple of days ago my Programming Language teacher built under my eyes a small compiler and and interpreter for arithmetic expressions in less than 10 minutes (while explaining Prolog to my fellow students). In fact he used extensively Prolog unification mechanism (since the arithmetic expressions he was compiling were also Prolog terms).

However, thanks to DCG grammars building a parser is also quite easy.

In fact I'm falling in love with declarative programming. I definitely start liking declarative languages like Haskell (functional) or Prolog (logic) a lot more than classical programming languages.

In fact I still love Python and Ruby. I like C (and  we *need* C). I also like compiled languages like ObjectiveC (waiting for Objective C 2.0...) and D. After all, I like C++ too. But what is the point of Java? I definitely don't understand. It's dramatically uninteresting.

Ubuntu on Powerbook G4 (Trackpad Synaptics Two finger scroll multibutton emulation)

After such a long title (mainly for tagging reasons) I suppose I have to write a long article.
One of the things I didn't really like about Ubuntu standard configuration, was the track-pad.
In fact, although correctly recognized (using the synaptics Xorg driver) default was really poor: the mouse moved so slowly it was unusable. For the first hours you can use an external three button mouse and live happily. However, I wanted to fix my track-pad. I found some articles on the net.
There are plenty of blog posts on this subject and if you use synaptics as a key you find even more. This is a good starting point. However, I strongly advise to read synaptics manual (man synaptics). It is synaptics with a trailing s, not synaptic, the package manager.
If you read the manual, you get a lot of informations about how to configure your trackpad. You also have some GUI configurators (gsynaptics, ksynaptics, qsynaptics). They offer less
Options (but are easier to use).
While I was able to set tracking speed and acceleration correctly (mainly using the link I posted) I'm still somewhat unsatisfied with scrolling facilities. Two finger scroll does not seem to work, and edge scrolling (using the edges of the trackpad to scroll) is far from perfect.
The best thing is that I can use tapping to emulate buttons. Tapping in the right upper corner is a middle click and right lower a right click. Now I can use the powerbook without an external mouse.
I'm to post my xorg.conf. And remember... using 'radeon' codecs instead of 'ati' should be a good thing (I've not investigated, I only read it somewhere)
My XOrg:
# /etc/X11/xorg.conf (xorg X Window System server configuration file) 
# 
# This file was generated by dexconf, the Debian X Configuration tool, using 
# values from the debconf database. 
# 
# Edit this file with caution, and see the /etc/X11/xorg.conf manual page. 
# (Type "man /etc/X11/xorg.conf" at the shell prompt.) 
# 
# This file is automatically updated on xserver-xorg package upgrades *only* 
# if it has not been modified since the last upgrade of the xserver-xorg 
# package. 
# 
# If you have edited this file but would like it to be automatically updated 
# again, run the following command: 
#   sudo dpkg-reconfigure -phigh xserver-xorg  
Section "Files"         
FontPath        "/usr/share/X11/fonts/misc"         
FontPath        "/usr/share/X11/fonts/cyrillic"         
FontPath        "/usr/share/X11/fonts/100dpi/:unscaled"         
FontPath        "/usr/share/X11/fonts/75dpi/:unscaled"         
FontPath        "/usr/share/X11/fonts/Type1"         
FontPath        "/usr/share/X11/fonts/100dpi"         
FontPath        "/usr/share/X11/fonts/75dpi"         
FontPath        "/usr/share/fonts/X11/misc"         # path to defoma fonts         
FontPath        "/var/lib/defoma/x-ttcidfont-conf.d/dirs/TrueType" 
End
Section "Module"         
        Load        "i2c"         
        Load        "bitmap"         
        Load        "ddc"         
        Load        "dri"         #
        Load        "dbe"         
        Load        "extmod"         
        Load        "freetype"         
        Load        "glx"         
        Load        "int10"         #
        Load        "record"         #
        Load        "v4l"         
        Load        "type1"         
        Load        "vbe" 
End
Section "InputDevice"         
Identifier        "Generic Keyboard"         
Driver                "kbd"         
        Option                "CoreKeyboard"         
        Option                "XkbRules""xorg"         
        Option                "XkbModel""pc105"         
        Option                "XkbLayout""it"         
        Option                "Xkb
        Options""lv3:ralt_switch"         
        Option                "XkbVariant""nodeadkeys" 
End
Section "InputDevice"         
Identifier        "Configured Mouse"         
Driver                "mouse"         
        Option                "Device""/dev/input/mice"         
        Option                "Protocol""ExplorerPS/2"         
        Option                "ZAxisMapping""4 5"         
        Option                "Emulate3Buttons""true" 
End
Section "InputDevice"          
Identifier         "Synaptics Touchpad"          
Driver                 "synaptics"          
        Option                "CorePointer"         
        Option                 "SendCoreEvents""true"          
        Option                 "Device""/dev/psaux"          
        Option                 "Protocol""auto-dev"          
        Option                 "SHMConfig""true"          
        Option                 "MinSpeed""0.30"          
        Option                 "MaxSpeed""1.10"          
        Option                 "EdgeMotionMinSpeed""200"          
        Option                 "EdgeMotionMaxSpeed""200"          
        Option                 "FastTaps""1"          
        Option                 "MaxTapTime""100"          
        Option                 "AccelFactor""0.030"          
        Option                 "HorizScrollDelta""0"          
        Option                "VertTwoFingerScroll""1"         
        Option                "HorizTwoFingerScroll""1"         
        Option                "CircularScrolling""1"         
        Option                "CircScrollTrigger""0"         
        Option                 "FingerLow""1"          
        Option                 "FingerHigh""3"          
        Option                 "LeftEdge""80"          
        Option                 "TopEdge""80"          
        Option                 "RightEdge""850"                  
        Option                 "BottomEdge""560"          
        Option                 "TapButton1""1"  
End
Section "Device"         
Identifier        "ATI Technologies, Inc. RV350 NP [Mobility Radeon 9600/9700 M10/M11]"         
Driver                "radeon"         
BusID                "PCI:0:16:0"         
        Option                "UseFBDev""true"         
End
Section "Monitor"         Identifier        "Generic Monitor"         
        Option                "DPMS" 
End
Section "Screen"         
Identifier        "Default Screen"         
Device                "ATI Technologies, Inc. RV350 NP [Mobility Radeon 9600/9700 M10/M11]"         
Monitor                "Generic Monitor"         
Default
        Depth        24         Sub
Section "Display"                 
        Depth                1                 
        Modes                "1280x786"         
End
Subsection "Display"                 
        Depth                4                 
        Modes                "1280x786"         
End
Subsection "Display"                 
        Depth                8                 
        Modes                "1280x786"         
End
Subsection "Display"                 
        Depth                15                 
        Modes                "1280x786"         
End
Subsection "Display"                 
        Depth                16                 
        Modes                "1280x786"         
End
Subsection "Display"                 
        Depth                24                 
        Modes                "1280x786"         
End 
End
Section "ServerLayout"         
Identifier        "Default Layout"         
Screen                "Default Screen"         
InputDevice        "Generic Keyboard"         
InputDevice        "Synaptics Touchpad"         
InputDevice        "Configured Mouse" 
End
Section "DRI"         
Mode        0666 
End

Ubuntu Edgy 6.10 on Powerbook G4 (general)

Recently I had to install a GNU/Linux system on my notebook. Mywork-group is heavily Linux-based and the software which I'm going touse/extend for my thesis at the moment compiles only on Linux (andprobably on BSD, however, not on MacOS X). Although it's a Prologsoftware some parts are written in C++ and I haven't had time to makeit work on MacOS X. Modifications are quite trivial (it's a matter ofbuilding a dylib). However, I should port the build system to libtoolto make it work on both platforms.

So I decided to install a GNU/Linux distribution. My first choice was Fedora. I'm not particularly fond of RH little sister, but that is the first choice in my team. The PPC version quite sucks. It's not that things do not work: simply it seems just a bit more than a recompilation of the x86 version. For example package pbbuttons (that makes multimedia keys of my pb work, wasn't even on the repositories). It's not that I particularly want to have the buttons working: it is like those who built it never used it. I mean no one ever tried to use it on a notebook? Moreover, it's painfully slow. Six minutes to boot. To start some applications (Emacs, the gnome 'quick' disc burner, etc) it takes minutes. This does not happen on the x86 version.

Partitioning

So I tried Ubuntu 6.10. It's heaven. Everything worked without my intervention. Well, almost everything.

The live CD did not recognize properly my screen. It thought it was a 20000x30000 pixel screen (and of course that was *wrong*): I just had to fix that manually. After booting the live CD, I started the installation. Everything went quite smoothly. Details in next posts

Monday, November 6, 2006

ssh-copy-id

I develop a lot on unix remote machines. I have CVS/SVN accounts and shell access. In fact it's quite tedious to retype the password everytime.
For this reason I started using ssh keys to automate this project. You find a lot of tutorials and informations on this subject, I won't spend more time on this. Basically you have to generate the key and then you have to upload it with scp or similar on the remote machine. However, I find scp syntax quite boring to type... :P
There is a nifty script that does the job for you. You call something like (with you correct remote credentials), it asks you password and you're done.
ssh-copy-id -i ~/.ssh/id_rsa.pub rik0@shell.berlios.de
This script is not in my open-ssh distribution and I had to find it on CVS. The direct download is . However, the script may be upgraded. So if you read this post after some time it has been posted, you may want to find a newer version.

Sunday, November 5, 2006

Cazzeggio (Which file extension are you?)

ogg-2006-11-5-19-21.jpg Which File Extension are You?
You are .ogg Even though many people consider you cool and happening, a lot still find that you're a bit too weird to hang out with.

Agile software development

In some sense I've always done agile programming. I've been in the Python community for a while, and a lot of agile practices are very common. I've done a lot unit testing (even if not really before coding but just afterwards) since the very first years as a developer.

In fact, it was even before my Cocoa and ObjectiveC studies that I learnt about patterns and how to use them. In fact patterns are well known in the ObjectiveC community even among newbie programmers, for exampl Anguish, Buck and Yacktman describes Cocoa showing which patterns are used in the framework and how their names differ from the classical GOF ones.

Of course this is not enough to say "Agile Development": I only had some practices in common with some (most) agile developement methods. I also find quite natural to use tests and code to learn instead of documentation (that is quite common for a lot of open source libraries). What I missed was a theoretical mind-framework.

In some software engineering courses I was introduced to UP. And in fact I could not believe that some people are convinced that it could work well (I agree that you may make it work, however, I think it's a suboptimal way to develop). It is like freezing insects in order to take pictures of them. While frozen, they stand still and are easier to photograph: however, if you want to photograph insects in the wild this just doesn't happen to work.

Some of my concerns were also due to Java and to the quantity of code that is not part of the problem but needs to be written for the way Java works. Python leans quite naturally towards agility: the way components interact, the way you can change things and refactor (even without complex refactoring tools like Eclipse).

After some time I discovered Ruby and got in touch with the Ruby community and all the hype about agile programming.

Eventually, I decided to buy Kent Beck's book on Extreme Programming. However, I wanted a technical book about agile programming. I bought Martin's Agile Software Development

I love them both and I will talk about Beck's book another time.

Agile Software Development is a very interesting book. There is plenty of information on agile techniques, and even if you are not a Java developer (which in some sense is my case, since I rarely work in Java) there are lots of interesting stuff.

You learn how to use test driven development and to think using TDD (there is a big difference between writing tests and thinking in a 'test oriented way'). You learn how to design interfaces while writing tests, and there are examples on software evolution driven by tests.

You definitely have to read the sources (that are really readable and easy to understand): while the explanation is great, the sources would be enough by themselves.

The book also offers a clear explanation of classical principles of object oriented software design (OCP, SRP, LSP, DIP) and packaging (ADP, SDP, SAP). So, you may want to buy this book even if you are not into agile programming and are only interested in a book about Object Oriented software development.

Saturday, November 4, 2006

Closures in Java

This is a hint abount implementing closures in Java. Of course, if you want to do it in an acceptable way, you would create a custom interface that Does The Right Thing™, something like (semi pseudocode)
interface Closure {
    public Object foo(Object bar);
}
However, you can get some ideas from the sequent code

public classStateModifierextendsObject{
    protectedString x ;
    
    public StateModifier() {
        x = "foo";
    }
    
    public ThreadgetThread() {
        finalString y = "yy";
        Thread t = newThread() {
            public voidrun(){
                x = "boo"+ y;
            }
        };
        return t;
    }
    
    public static voidmain(String[] args) {
        StateModifier sm = newStateModifier();
        Thread t1 = sm.getThread();
        System.out.println(sm.x);
        t1.start();
        try {
            t1.join();
        } catch (Exception e){
            // ...
        }
        System.out.println(sm.x);
        System.exit(0);
    }
}

Wednesday, October 25, 2006

Niceboxes (latex)

This nice sty comes from an example I found on the web. The code is not mine. However, I wasn't able to find again the original code (and the original author) to give credit. There may be some small modifications I made: if it does not work, it's likely it's my fault.
You can define things like
\usepackage{niceboxes}
\newenvironment{definition}[1][Definition]{
\par
\SpecialEnv{#1}{LightSlateGrey}{Lavender}{LightSlateGrey}{}%
}{%
\endSpecialEnv
}
Of course you need to have imported xcolors if you want to call colors by name (like LightSlateGrey)

Latex Companion

And eventually from Amazon arrived the Latex Companion...

I leafed through it and have to say it seems a wonderful book. There's plenty of information (in my opinion more than other book on Latex like the "Guide to latex" or Lamport's book -- that seems more suited to a beginner latex user).

In fact I was spending too much time on the web to look for information on intermediate or advanced tasks, and I was also thinking to abandon latex in favor of something else (by the way, I didn't find a true candidate, a part from docbook). Thanks to Andrea Bergia (I didn't find his proper blog/website) for pointing me the listings package that solving one of my immediate problems made me understand that it wasn't latex not being suited for what I had to do: it was me who needed to learn more latex.

Monday, September 11, 2006

Rails: Loading fixtures to development with a specified order

Background
My development style is heavily test driven. One of the very first things I do write are fixtures, in order to write tests on them. For this reason the very first database I use in a Rails projects is the test database, rather than the development one.
When I have to populate the development db, I already have plenty of fixtures: if they aren't enough to try the user interface (that is what is left to do), they weren't enough to test the models and the controllers.
Of course this should be no problem, there is the rake task db:load:fixtures that plays well with this kind of issues. However, I tend to use db constraints rules such as
ALTER TABLE bars ADD FOREIGN KEY(`foo_id`)  REFERENCES foos  (`id`) ON DELETE CASCADE;
if you use MySQL or
ALTER TABLE bars ADD CONSTRAINT valid_foo FOREIGN KEY (foo_id) REFERENCES foos (id) MATCH FULL;
if you use Postgres.
Solution
Of course, in this case you can't load fixtures in any order and you must supply correct fixtures loading order.
After some searching I found this solution.
However, it does not work. In fact it builds a Hash inserting fixture names in the specified order, than builds an array with other fixtures that had not been specified as needing some particular order.
Eventually, it sums the keys of the hash and the array, and loads fixtures in this order. I don't understand how the poster got this code working, since Ruby Hashes are not ordered in any way and the documentation explicitly says that iterators and methods return values in arbitrary order and do not rely on insertion order.
This meant I had to change the code. Now it uses arrays everywhere (which seems the most reasonable thing to do, since this rake task it's about order). I also added namespaces.
You still have to define something like

ENV["FIXTURE_ORDER"] = 
    %w( entities interaction_sources interaction_devices 
        payments interactions accounts employments 
        enrollments payables receivables tenures wards ).join(' ')

My code is:

require File.expand_path(File.dirname(__FILE__) + "/../../config/environment")
ENV["FIXTURE_ORDER"] ||= ""

def print_separator_line(n=80)
    puts "\n" + '='*n
end

desc "Load fixtures into #{ENV['RAILS_ENV']} database"

namespace :db do
    namespace :fixtures do
        task :load_ordered => :environment do
            require 'active_record/fixtures'
            print_separator_line
            puts "Collecting specified ordered fixtures\n"
            ordered_fixtures = ENV["FIXTURE_ORDER"].split
            fixture_files = Dir.glob(
                File.join(RAILS_ROOT, 'test', 'fixtures', '*.{yml,csv}'))
            other_fixtures = fixture_files.collect
                { |file| File.basename(file, '.*') }.reject 
                    {|fx| ordered_fixtures.include? fx }
            ActiveRecord::Base.establish_connection(ENV['RAILS_ENV'])
            all_fixtures = ordered_fixtures + other_fixtures
            print_separator_line
            puts "Fixtures will be loaded in following order\n"
            all_fixtures.each_with_index do |fx, i|
                puts "#{i+1}. #{fx}"
            end
        
            print_separator_line
            puts "Actually loading fixtures to #{ENV['RAILS_ENV']} db..."
    
            all_fixtures.each do |fixture|
                puts "Loading #{fixture}..."
                Fixtures.create_fixtures('test/fixtures',  fixture)
            end unless :environment == 'production' 
            # You really don't want to load your *fixtures* 
            # into your production database, do you?  
        end
    end
end

Thursday, August 24, 2006

RealBasic loops, arrays and MemoryBlocks

(... following text is my answer to a usenet post ..)
I wrote a test function that makes various speed-checks. The number Ireport are in Rosetta emulation, but it does not matter, since they are"good enough" and the point is not to bench RB against other "native"languages, but to test RB against itself.However, running in Windows gaves about a 2x improvement respect toRosetta code.
All loops involve 1000000(10e6) iterations (double loop make 1000 * 1000iterations). I also tried with 10000000 (10e7) iterations and it tookabout 10 times more than the 1000000(10e6) iteration. That is to saybechmarks are linear with number of elements (as expected). I didn't trybigger values as it involved paging, thus making quite useless the test.
The first one is a void loop. It takes 0.09 seconds on my MacIntel. Then I made a double void loop. It takes the same time. That is to sayusing two nested for does not take more time.
Then I made a simple loop that copies a variable into another variable.This runs in 0.17 seconds. Good.The third loop puts in the variable array elements. 0.17 seconds. Thesame. So arrays are well optimized: accessing an element of an arraytakes the very same time than accessing a single variable.
I then tried using two dimensional arrays: we get to 0.22 seconds. Sothis is slower, but not a *lot* slower (even if for large objects this+33% can make the difference).
And then memory blocks. I was surprised: 0.3 seconds. It seems thatusing memory blocks is slower than using arrays (and in fact there is arationale behind this: a lot of people use arrays to store large amountsof data, they *must* be ultra optimized. I think that MemoryBlocks areused more often in a different fashion).
#pragma DisableBackgroundTasks
#pragma DisableBoundsChecking
Const LoopSize = 1000000
Const SmallerLoop = 1000
Dim void_start, void_end, arr_start, arr_end  As Double
Dim var_start, var_end, mat_start, mat_end  As Double
Dim block_start, block_end, dvoid_start, dvoid_end  As Double
Dim var As Double
Dim value As  Double = 1.0
Dim arr(LoopSize) As Double
Dim mat(SmallerLoop, SmallerLoop) As Double
Dim block As MemoryBlock

for i As Integer = 0 to UBound(arr)
    arr(i) = value
next

for i As Integer = 0 to SmallerLoop
    for j As Integer = 0 to SmallerLoop
        mat(i, j) = 1.0
    next
next

block = NewMemoryBlock(LoopSize * 8)

for i As Integer = 0 To (LoopSize-1) * 8 Step 8
    block.DoubleValue(i) = 1.0
next

void_start = Microseconds()

for i As Integer = 0 to LoopSize
next
void_end = Microseconds()
MsgBox "Void loop(" + Str(LoopSize) + "): " + Str((void_end - void_start)/1000000) 

dvoid_start = Microseconds()

for i As Integer = 0 to SmallerLoop
    for j As Integer = 0 to SmallerLoop
    next
next

dvoid_end = Microseconds()
MsgBox "Double void loop(" + Str(LoopSize) + "): " + Str((dvoid_end - dvoid_start)/1000000)

var_start = Microseconds()
for i As Integer = 0 to LoopSize
    var = value
next
var_end = Microseconds()
MsgBox "Var loop(" + Str(LoopSize) + "): " + Str((var_end - var_start)/1000000)

arr_start = Microseconds()

for i As Integer = 0 to LoopSize
    var = value
next

arr_end = Microseconds()
MsgBox "Arr loop(" + Str(LoopSize) + "): " + Str((arr_end - arr_start)/1000000)

mat_start = Microseconds()
for i As Integer = 0 to SmallerLoop
    for j As Integer = 0 to SmallerLoop
        var = mat(i, j)
    next
next

mat_end = Microseconds()
MsgBox "Mat loop(" + Str(LoopSize) + "): " + Str((mat_end - mat_start)/1000000)

block_start = Microseconds()

for i As Integer = 0 To (LoopSize-1) * 8 Step 8
    var = block.DoubleValue(i)
next

block_end = Microseconds()
MsgBox "Block loop(" + Str(LoopSize) + "): " + Str((block_end - block_start)/1000000)

block_start = Microseconds()

for i As Integer = 0 To (LoopSize-1)
    var = block.DoubleValue(i*8)
next

block_end = Microseconds()
MsgBox "Block loop(" + Str(LoopSize) + "): " + Str((block_end - block_start)/1000000)

Saturday, June 24, 2006

Ruby on Rails: make LoginEngine fixtures dynamic

admin:
id: 1
login: admin
salted_password: 
salt: 
email: admin@company.com
verified: 1
This way you always generate fixtures that behave correctly according to your salt and such. Otherwise I had problems when doing "rake load_fixture" in the development db (the login failed).

Thursday, June 22, 2006

Rendering a chosen RJS

Of course to render a rjs that has a name different from the controller method, we must use the :action specifier. For example we could have something like:
respond_to do |wants|
    wants.html {render  :partial=>"post", :object=>@post, :layout=>'posts'}
    wants.js   {render  :action=>"remote_show"}
end
Although this is pretty obvious if you reason about it, you may need some time to get it. And on the net there aren't easily accessible examples about it (or at least I found none).
It's just a dirty trick.

Wednesday, June 21, 2006

Pattern Matching in Ruby

Inspired by a discussion on the Italian Ruby mailing list, I decided to write (probably another) pattern matching implementation for ruby. In the short term I want to be able to define something like:

func :qsort, [] { [] } 
func :qsort do | list | 
    qsort( list.select {|a| a <  list[0]} )  + 
        list.select {|a| a == list[0]}    + 
        qsort( list.select {|a| a >  list[0]} ) 
end 

and if I find time it could become even some kind of "declarative like" unification algorithm... who knows. The point is I don't even know if I have enough time for the "toy" pattern matching.

[from the future: I did not find time!]

Wednesday, June 14, 2006

RJS Rails If statement

If you do something like:

page.replace_html 'non_existent', 'some text'

you get a Javascript error and sequent lines are not processed. The first idea to solve this problem would be:

if page['not_valid']
    page.replace_html 'not_valid', 'some text'
end

However, you don't get what you meant. Not at all. This is translated to

$("not_valid")
Element.update("not_valid", "some text");

The best way I found do express that meaning is

page.select('not_valid').each do |unique|
    page.replace_html unique, 'some text'
end

That works.

Thursday, June 1, 2006

Sul Duck Typing (Ruby, Python, etc...)

In tempi di Duck Typing ridiventa certamente di moda la distinzione fra classi e tipi che si trova per esempio nel capitolo introduttivo di Design Patterns di Gamma, Helm, Johnson e Vlissides.

La maggior parte delle persone sono oggi abituate a linguaggi come Java o C++ dove classi e tipi in buona sostanza coincidono (con la possibile eccezione della programmazione generica).

Un tipo è il nome usato per denotare una particolare interfaccia. Questo non ha direttamente a che vedere con le Interfaces del Java, per inciso. L'interfaccia è l'insieme di metodi (o di messaggi, per dirla con Ruby, Smalltalk o ObjectiveC) cui un oggetto risponde.

Tuttavia un oggetto può implementare diversi tipi, e tipi diversi possono essere condivisi da oggetti molto diversi fra loro. In questo i tipi si comportano come le interfacce Java. Qualunque programmatore Java sa che ci sono Interfacce (uso il maiuscolo per indicare che intendo la parola nel senso "Javesco") supportate da diversi oggetti (leggi Cloneable, per esempio) e che un qualunque oggetto può implementare un numero grande a piacere di Interfacce.

La differenza è che le Interfacce di Java sono statiche e "legate" alle classi. Una data classe dice di implementare un certo numero di interfacce. Le istanze di quella classe avranno quindi per tipo le varie interfacce.In Ruby tuttavia questo non accade. Un oggetto ha un dato tipo perché risponde a determinati messaggi, ma non specifichiamo da nessuna parte quale sia il tipo. Per il solo fatto di rispondere a certi messaggi un oggetto è di un dato tipo (a prescindere dalla sua classe). I tipi sono "informali" (usando la stessa differenza fra protocolli formali e informali di ObjectiveC -- i protocolli formali si comportano sostanzialmente in modo simile alle Interfacce di Java, quelli informali sono appunti "non staticamente tipizzati").

La classe è invece legata direttamente all'implementazione. Citando DP del GOF"È importante capire la differenza fra la classe di un oggetto e il suo tipo. La classe definisce come è stato implementato. Definisce il suo stato interno e l'implementazione delle sue operazioni. D'altra parte il tipo di un oggetto si riferisce solo alla sua interfaccia, all'insieme di richieste a cui può rispondere. Un oggetto può avere molti tipi e oggetti di classi differenti possono avere lo stesso tipo

Naturalmente c'è una una stretta parentela fra classe e tipo. Poiché una classe definisce le operazioni che un oggetto è in grado di eseguire, definisce anche il suo tipo. Quando diciamo che un oggetto è un'istanza di una classe, diciamo implicitamente che l'oggetto supporta l'interfaccia definita dalla classe.

Linguaggi come C++ e Eifell (e Java ndt) usano le classi per specificare sia il tipo di un oggetto che la sua implementazione"

D'altra parte linguaggi dinamici come Ruby o Python (o Smalltalk) non dichiarano i tipi delle variabili. Mandano messaggi (o chiamano metodi, per dirla con Python) agli oggetti denotati dalle variabili e se tali oggetti supportano il messaggio, tutto funzionerà.

La differenza fra ereditarietà di classe e ereditarietà di tipo è importante. L'ereditarietà di classe riguarda definire l'implementazione di un oggetto attraverso l'implementazione di un altro oggetto. È senza dubbio un concetto "DRY". Se ho già definito delle cose (e gli oggetti sono sufficientemente vicini) posso mantenerle uguali. In Ruby a questo si associano i mixin che permettono di condividere codice *senza* andare in ereditarietà (ma questa è un'altra storia), in Python si può usare l'ereditarietà multipla per emulare i mixin.

L'ereditarietà di tipo è invece relativa all'utilizzare un oggetto al posto di un altro. In questo senso siamo più vicini al Principio di Sostituzione di Barbara Liskov. Un ottimo articolo relativo al LSP si trova qui . In buona sostanza possiamo enunciare il Principio di Sostituzione di Liskov come:

T1 è un sottotipo di T2 se per ogni oggetto O1 di tipo T1 esiste un oggetto O2 di tipo T2 tale che per ogni programma definito in termini di T1 il comportamento del programma è invariato sostituendo O2 al posto di O1.

Confondere ereditarietà di tipo e di classe è facile. Molti linguaggi non hanno alcuna distinzione fra le due. Anzi, vengono usate le classi per definire i tipi. Molti programmatori per esempio vedono il LSP solo in relazione alle sottoclassi (in effetti in C++ è importante fare in modo tale che gli oggetti si comportino "bene" in relazione al LSP, in quanto è sempre possibile fare puntare un puntatore o una reference di un supertipo ad un oggetto del suo sottotipo).

In realtà il principio di Liskov è una cosa *molto* severa. Due oggetti sono sostituibili secondo Liskov se davvero (limitatamente al tipo comune) si comportano allo stesso modo, e viene naturale pensarli come classe - sottoclasse (anche se effettivamente questo *non* è necessario).Il DuckTyping è meno severo: non chiede che il programma abbia lo stesso comportamento. Due oggetti sono appunto sostituibili se rispondono alle stesse cose, anche se rispondono in modo diverso. Non ha *nulla* a che fare con il Design By Contract: è molto più vicino a quello che in C++ si fa con i templates (che alcuni in effetti vedono come il corrispettivo statico del Duck Typing).

Tornando a noi, se un oggetto si comporta secondo un certo tipo (ovvero risponde ai metodi propri di quel tipo), allora trattiamolo come tale. Da cui: se si comporta come una papera, trattiamolo come una papera (Duck Typing, appunto).

Un linguaggio come Ruby rende *molto* problematico considerare come stretta la relazione fra tipi e classi. In ogni punto del programma (anche se non è sempre buona pratica farlo) possiamo cambiare il tipo di tutti gli oggetti di una data classe (aggiungendo o togliendo metodi alla stessa), o singolarmente ad un singolo oggetto. Sintomatico è in questo senso avere deprecato il metodo Object#type in favore di Object#class.

Monday, May 29, 2006

Beautify Fixtures (Ruby on Rails)

After generating fixtures from my development database (you can read about it here) I decided to make them nice.

Up to now, for example, Permissions were labelled permission_001 and so on. And that is quite counter-intuitive. A good convention to label permissions would be controller_action. For example:

user_new: 
action: new 
id: 19 
controller: user 

The same applies to roles (you can label them with their name) and permissions_roles (you can build the name from their associated role and permission names). This is what my small script does. It is a rake task.
desc "Labels Roles, Permissions and PermissionsRoles after their value so that
the fixture becomes more readable. You can specify the DIR where fixtures we want to modify are.

require 'yaml' 
class Hash 
    def find_item(key, value, default=nil) 
        each do |k, v| 
            return k if (v[key] == value) || (v[key] == value.to_i) 
        end 
        default 
    end 
end 

task :beautify_fixtures => :environment do 
    dir = ENV['DIR'] || "#{RAILS_ROOT}/test/fixtures" 
    db   = ENV['DB']   || 'development' 
    File.cp "#{dir}/roles.yml",             "#{dir}/roles.bkp.yml" 
    File.cp "#{dir}/permissions.yml",       "#{dir}/permissions.bkp.yml" 
    File.cp "#{dir}/permissions_roles.yml", "#{dir}/permissions_roles.bkp.yml" 
    permissions = YAML.load_file("#{dir}/permissions.bkp.yml") 
    roles = YAML.load_file("#{dir}/roles.bkp.yml") 
    File.open("#{dir}/permissions.yml", 'w') do |file| 
        permissions.each do |k, v| 
            controller = v['controller'] 
            action = v['action'] 
            file.write "#{controller}_#{action}:n" 
            v.each {|key, value| file.write "  #{key}: #{value}n"} 
        end 
    end 
    File.open("#{dir}/roles.yml", 'w') do |file| 
        roles.each do |k, record| 
            file.write "#{record['name'].downcase}:n" 
            record.each {|key, value| file.write "  #{key}: #{value}n"} 
        end 
    end 

    permissions = YAML.load_file("#{dir}/permissions.yml") 
    roles = YAML.load_file("#{dir}/roles.yml") 
    prs = YAML.load_file("#{dir}/permissions_roles.bkp.yml") 
    
    File.open("#{dir}/permissions_roles.yml", 'w') do |file| 
        prs.each do |k, record| 
            permission = permissions.find_item('id', record['permission_id']) 
            role = roles.find_item('id', record['role_id']) 
            file.write "#{role}_#{permission}n" 
            record.each {|key, value| file.write "  #{key}: #{value}n"} 
        end 
    end 
end 

You can find a downloadable version here [BROKEN!]

Friday, May 26, 2006

Dump db to fixtures (generate fixtures from db)

This is a quite interesting thread on rails ml with the solution to my problem.
I wanted to generate fixtures automatically from my db. This is a rake action that does the job well. Just put it into lib/tasks/ and call it dump_fixtures.rake

desc 'Dump a database to yaml fixtures. Set environment variables DB
and DEST to specify the target database and destination path for the
fixtures. DB defaults to development and DEST defaults to RAILS_ROOT/
test/fixtures.'

task :dump_fixtures => :environment do
    path = ENV['DEST'] || "#{RAILS_ROOT}/test/fixtures"
    db   = ENV['DB']   || 'development'
    sql  = 'SELECT * FROM %s'

    ActiveRecord::Base.establish_connection(db)
    ActiveRecord::Base.connection.select_values('show tables').each do |table_name|
        i = '000'
        File.open("#{path}/#{table_name}.yml", 'wb') do |file|
            file.write ActiveRecord::Base.connection.select_all(sql %
table_name).inject({}) { |hash, record|
                hash["#{table_name}_#{i.succ!}"] = record
                hash
            }.to_yaml
        end
    end
end
# ActiveRecord::Base.connection.select_values('show tables')
# is mysql specific
# SQLite:  ActiveRecord::Base.connection.select_values('.table')
# Postgres
# table_names = ActiveRecord::Base.connection.select_values(<<-end_sql)
#    SELECT c.relname
#    FROM pg_class c
#      LEFT JOIN pg_roles r     ON r.oid = c.relowner
#      LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
#    WHERE c.relkind IN ('r','')
#      AND n.nspname IN ('myappschema', 'public')
#      AND pg_table_is_visible(c.oid)
# end_sql

Wednesday, May 17, 2006

Open in iTerm from Finder

Since I've been asked to do it, I wrote the "Open in iTerm" app, that does the very same thing than "Open in Terminal" (except that it uses iTerm instead of Terminal.app)

Actually there could be bugs, but at least on my system I suppose it's not my fault, but rather it is iTerm that crashes a lot (maybe the AppleScript part has not been tested extensively on intel macs).

If you experience problems, let me know.

The script part that accesses iTerm is

tell application "iTerm"

        activate

        set myTerm to (make new terminal)

        tell myTerm

                set mySession to (make new session at the end of sessions)

                tell mySession

                        exec command "/bin/bash"

                        write text "cd \""& myPath &"\""

                end tell

        end tell

end tell

I bundled both apps in the very same package, and you can download it here.

I upgraded this script too with Marco's help.

on run

        tell application "Finder" to try

                set myPath to quoted form of ¬

                        POSIX path of ((target of window 1) as string)

        on error

                set myPath to "~"

        end try

        

        tell application "iTerm"

                activate

                set myTerm to (make new terminal)

                tell myTerm

                        set mySession to (make new session at the end of sessions)

                        tell mySession

                                exec command "/bin/bash"

                                write text "cd "& myPath

                        end tell

                end tell

        end tell

end run

Definitive?

After some discussion on icm and the help of tv and Marco (thank you guys) we came to this

on run

tell app "Finder"

try

set myPath to quoted form of ¬

POSIX path of ((target of window 1) as string)

on error

set myPath to "~"

end

set esiste to ((count (get ¬

name of every application process ¬

whose creator type is "ITRM")) is not 0)

end

tell app "iTerm"

activate

if esiste then

set myTerm to (make new terminal)

else

set myTerm to the first terminal

end if

tell myTerm

make new session at the end of sessions

tell the last session

exec command "/bin/bash"

write text "cd "& myPath

end

end

end

end

Open in Terminal from Finder

This is a trivial AppleScript
tell application "Finder"
        set targ to target of Finder window 1
        set myPath to POSIX path of (targ as string)
end tell
tell application "Terminal" to do script "cd \""& myPath &"\""
Here you can see a couple of screenshots.
If you want to download the application (with pretty icon) that uses it, you can download it from here.
Edit: Thanks to Marco Balestra for showing me his own script (that does the same thing, but is much cleaner). Now I included his script instead of mine.
on run
        tell application "Finder" to try
                set myPath to quoted form of ¬
                        POSIX path of ((target of window 1) as string)
        on error
                set myPath to "~"
        end try
       
        tell application "Terminal"
                do script "cd "& myPath
                activate
        end tell
end run

Monday, May 1, 2006

Open with BBEdit

This is a tiny Automator Workflow that opens the current Safari document with BBEdit.
Previously I wrote an AppleScript that did the very same job, however the new document was marked as "new", thus when one tried to close it, BBEdit would complain and ask if we wanted to save
tell application "Safari"
        set cur to document 1
        set mySource to source of cur
        set myName to name of cur
end tell
tell application "BBEdit"
        set x to make new text document with properties ¬
                {contents:mySource, name:myName}
        activate
end tell
However if we use the automator action "New BBEdit Document", it offers the possibility to "Set unmodified" (that is to say the new document won't ask if it has to be saved before closing).
I created a two step Automator workflow. The first step is a slightly modified version of the first part of the script. It is a "Run AppleScript action" containing
on run {input, parameters}
        tell application "Safari"
                set cur to document 1
                set mySource to source of cur
                set myName to name of cur
        end tell
        return mySource
end run
The second step is the "New BBEdit Document" action.
You can download the workflow here. I sugest to put it in the script menu.

Sunday, April 23, 2006

Quicksort (C and Python)

The quick-sort algorithm

Quick-sort is a sorting algorithm developed by Hoare that in average does O(n log n). The O() notation means that if the list length is n, the number of operations used to sort the list is in the order of magnitudo of n log n (in this case).

To be more formal, we shall say that there exist two constants c1 and c2 such that the number of operations m isc1 * n * log n <= m <= c2 * n * log n. Notice that c1 and c2 are not specified in the O notation
We stop with math here. If you want more precise informations about the O notation read the wikipedia.

Now I'm going to briefly explain the quick-sort algorithm, and then we will use that algorithm to exploit the expressive power of some programming languages. Keep in mind that more expressive does not necessarily mean faster to run. It means faster to write. But being faster to write means that we can try more efficient algorithms, thus improving performance in a more effective way.
Expressive languages are usually high-level. In fact we can also profile the code, track the bottleneck and rewrite only a couple of functions in C or in ASM. This lead to fast programs with fast development cycles.

Quick-sort strategy

Quick-sort uses a divide and conquer approach. We divide a list in two sublists using a pivot element. All the elements less than the pivot come before it, those greater after. Then we recursively apply quick-sort to the two sublists.
Notice that the pivot is chosen arbitrarily. Some implementations chose the first element of the list. Some chose it randomly, some use other criteria.
Moreover there are implementations of quick-sort that sort the list in place (that is to say memory used is bounded), some need extra storage. Guess which are the more efficient.

On wikipedia you can find more informations about the quick-sort algorithm. You can find also some implementations, and a link to a page full of implementations in different languages.

I almost did not write code. I took code from the wikipedia or around the web and I'm going to comment it. This is the "new" work I did. Comments, explanations, not code. I also wrote a couple of implementations, however I don't remember which ones.

Python

2.4.2 (#1, Mar 23 2006, 22:00:44)  
[GCC 4.0.1 (Apple Computer, Inc. build 5250)] 

Now lets examine a Python high level implementation.

def qsort(L): 
    if L == []: return [] 
    return qsort([x for x in L[1:] if x< L[0]]) + L[0:1] + \ 
            qsort([x for x in L[1:] if x>=L[0]]) 

This is the exact description of the quick-sort algorithm, if you know what Python list comprehension are. When I write [x for x in L if p(x)], Python computer the list of elements of the L container that make p(x) true.

L[a:b] is the sublist of L[L[a], L[a+1], ..., L[b-1], L[b]]. L[a:]==L[a:len(L)], L[:b]==L[0:b]

[x for x in L[1:] if x< L[0]] is the sublist of the elements of L a part from the first one (that is the pivot) such that x is less than the pivot (L[0]).

[x for x in L[1:] if x>=L[0]] is the sublist of the elements of L a part from the first one (that is the pivot) such that x is greater or equal than the pivot (L[0]).

Then we pass those sublists to qsort that recursively sorts them, and eventually we return a append all those sublists and return them.

This implementation, although beautiful to read and understand is horrible from a performance point of view. It creates, concatenates and destroys a large number of lists. Moreover it is purely recursive. For large lists, we can even exhaust the stack space.

A version of quick-sort in Python that has no need for more memory is this:

def partition(array, begin, end, cmp): 
    while begin < end: 
        if cmp(array[begin], array[end]): 
            (array[begin], array[end]) = (array[end], array[begin]) 
            break 
        end -= 1 
    while begin < end: 
        if cmp(array[begin], array[end]): 
            (array[begin], array[end]) = (array[end], array[begin]) 
        break 
        begin += 1 
    return begin 
    
    def sort(array, cmp=lambda x, y: x > y, begin=None, end=None): 
        if begin is None: begin = 0 
        if end   is None: end   = len(array) 
        if begin < end: 
            i = partition(array, begin, end-1, cmp) 
        sort(array, cmp, begin, i) 
        sort(array, cmp, i+1, end) 

[NOTE: code has been reformatted without running it again: it is likely
to be wrong!]

If you want to run this code, remember that in Python indentation matters. If you just copy and paste the code, you may end with unusable code.
Anyway, I benchmarked all of them. "Elegant quicksort" is the first implementation. "Standard quicksort" is the latter. "Bultin sort" is standard python sort algorithm (that is not a quicksort, but a dedicated sorting algorithm).

Elegant qsort: [21.20, 19.75, 20.08, 20.46]
Builtin qsort: [1.61, 2.29, 2.29, 2.29]
Standard qsort: [31.15, 30.60, 30.68, 30.38]

Probably due to heavy optimization performed in list comprehension the "elegant" algorithm is faster than the standard one. [from the future: I hardly believe that]. However both of them are more than ten times slower than the python built-in sort algorithm.
We have learnt:

  1. High level languages are great to express algorithms in a readable way (and we can use them to test them and see how they work and all)
  2. You should not implement "basic" algorithms in high level languages: use their built-ins or write them in C (most built-ins are written in C).

C

Now it is time for a couple of pure C quick-sort:

void qsort1(int a[], int lo, int hi ){ 
    int h, l, p, t; 
    if (lo < hi) { 
        l = lo; 
        h = hi; 
        p = a[hi]; 
        do { 
            while ((l < h) && (a[l] <= p))  
                l = l+1; 
            while ((h > l) && (a[h] >= p)) 
                h = h-1; 
            if (l < h) { 
                t = a[l]; 
                a[l] = a[h]; 
                a[h] = t; 
            } 
        } while (l < h); 
        t = a[l]; 
        a[l] = a[hi]; 
        a[hi] = t; 
        qsort( a, lo, l-1 ); 
        qsort( a, l+1, hi ); 
    } 
} 

Another implementation is:

void quicksort(int x[], int first, int last) { 
    int pivIndex = 0; 
    if(first < last) { 
        pivIndex = partition(x,first, last); 
        quicksort(x,first,(pivIndex-1)); 
        quicksort(x,(pivIndex+1),last); 
    } 
} 

int partition(int y[], int f, int l) { 
    int up,down,temp; 
    int piv = y[f]; 
    up = f; 
    down = l; 
    goto partLS; 
    do {  
        temp = y[up]; 
        y[up] = y[down]; 
        y[down] = temp; 
        partLS: 
        while (y[up] <= piv && up < l) { 
         up++; 
        } 
        while (y[down] > piv  && down > f ) { 
            down--; 
        } 
    } while (down > up); 
    y[f] = y[down]; 
    y[down] = piv; 
    return down; 
} 

C also have a builtin quick-sort, that is much more advanced than this ones, since it allows to specify a comparison function, thus working with all kinds of data types.

Benchmarking these was hard. They too fast to be benchmarked with a list of a million elements and the standard posix clock function. However if you want to run the bench yourself, I could not use MacOS X specific bench functions.
So I used a list of 100000000 elements.
This is the (ugly) code I used for benchmarking. The test is not really interesting. Of course since C integers are machine integers it is dramatically faster than other solutions. It could have been more interesting if we sorted other data structures, where the gap could be reduced.

#include <stdlib.h> 
#include <stdio.h> 
#include <time.h> 
#include "qsort1.h" 
#include "qsort2.h" 
#define LEN 100000000 

int intcmp(const void *a, const void *b){ 
    return *(int*)a - *(int*)b; 
} 

int main(){ 
    size_t index; 
    int *l = malloc(sizeof(int)*LEN); 
    clock_t last, current; 
    srand(time(NULL)); 
    for(index = 0; index < LEN; ++index){ 
        l[index] = rand(); 
    } 
    
    last = clock(); 
    qsort(l,LEN, sizeof(int), intcmp); 
    current = clock(); 
    printf("C qsort: %f\n", (current-last)/(float)CLOCKS_PER_SEC); 
    
    for(index = 0; index < LEN; ++index){ 
        l[index] = rand(); 
    } 
    last = clock(); 
    qsort1(l,0, LEN-1); 
    current = clock(); 
    printf("int qsort: %f\n", (current-last)/(float)CLOCKS_PER_SEC); 
    for(index = 0; index < LEN; ++index){ 
        l[index] = rand(); 
    } 
    last = clock(); 
    quicksort(l,0, LEN-1); 
    current = clock(); 
    printf("qsort 2: %f\n", (current-last)/(float)CLOCKS_PER_SEC); 
    free(l); 
    return 0; 
} 

Friday, April 21, 2006

Something about Prolog...

As I had to admit in my last Prolog post, I'm no Prolog guru. I'd rather deserve the term newbie. However, I followed some math logic courses that make it all less harder. If you find something incorrect or if you don't understand something, comment it.

The first thing you have to keep in mind is that you don't really tell Prolog how to do something. You rather ask "him" (or her?) to prove something following the rules you gave.

Basic datatypes

In Prolog we have "Strings". We can also use terms like string or dog. However if they have capital letter, they are
variables, not simple terms.
We have list that is to say [], [a, b, c]. If we write
[H | T] H is the first element of the list, T a list containing all the remaining elements. If the list is [a, b, c], H is a and T is [b, c]

Computation model

Relations

Suppose you are working with a graph. Now suppose that you want to query the graph. As an example I report here a graph about elven genealogy in the LOTR world.The schema comes from this site. If it is wrong, tell them, not me :)

For example there are different interpretation regarding Indis. Some say she's Ingwë's sister, some say she's the daughter of Ingwë's sister. We chose to say she's Ingwë's sister since I do not want to introduce a character with no name in our database :).

The House of Finarfin:

                + - - - - - +
                :         Ingwë
             (sister)  + - - - - - +
                |      |           |
      Finwë = Indis   Olwë       Elwë = Melian
            |           |             |
          Finarfin = Ëarwen          Luthien
                   |
  +----------------+----------------+---------+
  |                |                |         |
  |                |              Aegnor      |
Finrod = Amarië Angrod = Edhellos    Galadriel = Celeborn
         :                |                         |
  (descendants        Orodreth          Elrond = Celebrian
     in Aman?)            |                    |
                  +-------+------+      +------+---------+
                  |              |      |      |         |
               Gil-galad    Finduilas Elladan Elrohir  Arwen

How would we represent this in an imperative language? We probably would have some kind of

class Elf 
- gender   (M/F) 
- name     (String) 
- children (Elven) 
- partner  (Elf) 
end 

We could also have a kind of "double linked structure". This is probably wise since otherwise querying someone's father would be dramatically expensive.

class Elf 
- gender   (M/F) 
- name     (String) 
- children (Elven) 
- partner  (Elf) 
- mother   (Elf) 
- father   (Elf) 
end 

In prolog we do not have such structure. We just tell "facts". A fact is a predicate that is always true. You can see how we define facts above. For example we are saying that aegnor is male.

male(aegnor). 
male(angrod). 
... 

You can imagine this like

Elf Aegnor;Aegnor.name = "Aegnor";Aegnor.gender = M;... 

Note that in Prolog everything is a Term, both data and program. Above I'm saying that aegnor is male. I assert that male(aegnor) is true.
This is the complete list of facts to define the above graph. If I were smarter I chose a smaller graph. However the list is long, but the imperative code to put into memory would not have been significantly shorter (yes, we could read it from file... where we would have had a list similar to the prolog version, :) )

female(amarie). 
female(arwen). 
female(celebrian). 
female(earwen). 
female(edhellos). 
female(finduilas). 
female(galadriel). 
female(indis). 
female(luthien). 
female(melian). 
male(aegnor). 
male(angrod). 
male(celeborn). 
male(elladan). 
male(elrohir). 
male(elrond). 
male(elwe). 
male(finarfin). 
male(finrod). 
male(finwe). 
male(gilgalad). 
male(ingwe). 
male(olwe). 
male(orodreth). 
parent(ingwe, olwe). 
parent(ingwe, elwe). 
parent(indis, finarfin). 
parent(finwe, finarfin). 
parent(olwe, earwen). 
parent(elwe, luthien). 
parent(melian, luthien). 
parent(finarfin, finrod). 
parent(earwen,   finrod). 
parent(finarfin, angrod). 
parent(earwen,   angrod). 
parent(finarfin, aegnor). 
parent(earwen,   aegnor). 
parent(finarfin, galadriel) . 
parent(earwen,   galadriel) . 
parent(angrod, orodreth). 
parent(edhellos, orodreth). 
parent(galadriel, celebrian). 
parent(celeborn, celebrian). 
parent(orodreth, gilgalad). 
parent(orodreth, finduilas). 
parent(elrond,    elladan). 
parent(celebrian, elladan). 
parent(elrond,    elrohir). 
parent(celebrian, elrohir). 
parent(elrond,    arwen). 
parent(celebrian, arwen). 

Rules

A rule is in the form:

goal(X) :- subgoal(X), subgoal2(X). 

In the above goal(X) is true if and only if both subgoal(X) and subgoal2(X) are true. We say we reducegoal to the two subgoals.
We will query something like:

goal(elrond). 

and it will succeed if and only if both subgoal(elrond) and subgoal2(elrond). Notice this times we have constants and not variables.
Now it is time to define some interesting relation. A father is a male parent. A mother a female parent. In Prolog we write:
father(X, Y) :- parent(X, Y), male(X). 
That is to say... X is father of Y if and only if X is parent of Y and X is male.
If we want to find Orodreth father, we query:
father(X, orodreth). 
If is succeeds (and it does) in X there is Orodreth's father: Angrod. The imperative version of this would have been... orodreth.father. If we were wise enought (or had enough space) to use a double linked structure. If we didn't, well... we would have to traverse the graph.
We can also do the converse.
father(finarfin, X). 
would return all the children of finarfin.

mother(X, Y) :- parent(X, Y), female(X). 
son(X, Y)    :- parent(Y, X), male(X). 
daughter(X, Y)  :- parent(Y, X), male(X). 

These are other trivial relationship. Now suppose you want to find out all the ancestors of a given character. If you had the simpler structure, it is a pain in the ass. If you have the complete one, it is boring. Basically you have to tell the system to traverse the graph following father and mother links. Simple and tedious.
In fact what you are doing is the "transitive closure" of the parent relationship. In Prolog it is exactly how define in natural language. X is ancestor of Y if X is parent of Y or if X is parent of Z and z is ancestor of Y.

ancestor(X, Y) :- parent(X, Y). 
ancestor(X, Y) :- parent(X, Z), ancestor(Z, Y). 

All this relationships can be used to find the ancestors or the grandchildren. Or to verify if the relationship holds between two elves.
mother(galadriel, celebrian). returns yes, while mother(galadriel, arwen). returns no. You have to distinguish between defining facts and relationships and querying the system. This depends from your prolog interpreter: read a book, a tutorial, something. This is just a hint (and what I need to explain quicksort later on).
If you name this file elves.pl you can load it in swi-prolog with

[elves]. 

(remeber, no extention and end with a dot). Then you can just make queries. To define new predicates, put them in elves.pl and reload the file.

Some arithmetic...

We can consider the trivial algorithm to sum two numbers, provided we have an
INC and a DEC operation.
VAR A, B 
WHILE B > 0 
INC A 
DEC B 
END 

This is quite obvious. If you (like me) hate those pseudo code examples, this is pure C.

unsigned int sum(unsigned int a, unsigned int b){ 
    while(b) { 
        ++a; --b; 
    } 
    return a; 
} 
It is quite interesting an example in a functional programming. This resembles the basic definition of addition if you are working with primitive recursive functions (S is the successor function and we omitted to write explicitly the i-th projection function).

sum( x, 0   ) = x 
sum( x, y+1 ) = S(sum(x, y))

And this is Haskell (almost the same thing, a part from notations).

mysum :: Int -> Int -> Int 
mysum x 0 = x 
mysum x y = mysum x (y-1) + 1 

In Prolog once again you follow the formal definition of sum in Set Theory. (That by the way is quite the same as above).

add(X, 0, X).add(0, Y, Y).add(X, Y, X1) :- succ(Y1, Y), add(X, Y1, X2), succ(X2, X1). 

How this works? When I query something with add prolog will try to "match" against the arguments of the main goal. I don't want to be more precise about this "matching". It is called unification, and you are supposed to read a prolog tutorial to understand more about it.
add(5, 0, 5). will match the first rule: the second parameter is 0, and the other two are the very same term. Now let's consideradd(0, 5, X). . This could match the second rule. 0 matches 0. 5 matches Y. From this point for add to succeed, X must be 5 too. From swi-prolog:

2 ?- plus(0, 5, X). 
X = 5  
Yes 

However, we said that we expect to use Prolog predicates in many different ways. Not only to sum two numbers, but also to find if given three numbers the third is the sum of the first two. We expect also that trivial predicates can be inverted. That is to say given a number and the "sum" find the number that summed to the first one gives the sum.
That is to say we expect to have the subtraction defined by only defining the sum. In fact this ain't magic. Prolog builtin plus predicate does exaclty this.
This is a copy and paste from the swi prolog terminal:

31 ?- plus(X, 4, 7). 
X = 3  
Yes 
32 ?- plus(8, 9, Z). 
Z = 17  
Yes 
33 ?- plus(3, Y, 1). 
Y = -2  
Yes 

For those of you who are interested, we can define such a predicate this way, using builtin prolog arithmetic and some metalogical variables:

add(X, Y, Z) :- nonvar(X), nonvar(Y), Z is X+Y . 
add(X, Y, Z) :- nonvar(X), nonvar(Z), Y is Z-X . 
add(X, Y, Z) :- nonvar(Y), nonvar(Z), X is Z-Y . 

nonvar returns true if and only if its argument is not a variable. That is to say, if instead of plus we used add, add(X, 4, 7). would have called the "third" predicate (for Y and Z are not variables, since they are "unified" -- substituted -- with 4 and 7). add(8, 9, Z). would have called the first predicate and add(3, Y, 1) the second.

But we can do something even smarter. First of all we recognize that the three clauses above are almost mutually exclusive. Two of the three clauses will fail if two of the three parameters are not variables. They will all succeed in the very same way if all three are not variables. They will all fail if only one is a variable.

So we can tell prolog: after you determined which is the variable, you can only use that rule. If none is a variable, use the first rule. We use "cuts". The rules become.

add(X, Y, Z) :- nonvar(X), nonvar(Y), !, Z is X+Y . 
add(X, Y, Z) :- nonvar(X), nonvar(Z), !, Y is Z-X . 
add(X, Y, Z) :- nonvar(Y), nonvar(Z), !, X is Z-Y . 

Cuts are a complex subject. I'm not going to treat them here. Now I introduce a predicate between.


between(+Low, +High, ?Value)
Low and High are integers, High >=Low. If Value is an integer,
Low=< Value=< High. When Value is a variable it is successively
bound to all integers between Low and High. If High is inf
or infinite between/3 is true iff Value>= Low, a feature that is
particularly interesting for generating integers from a certain
value.

At this point we add this clauses:
add(X, Y, Z) :- nonvar(Z), !, between(0, Z, X), add(X, Y, Z). 
add(X, Y, Z) :- nonvar(Y), !, between(0, inf, X), add(X, Y, Z). 
add(X, Y, Z) :- nonvar(X), !, between(0, inf, Y), add(X, Y, Z). 

Now you are able given one variable to generate pairs that satisfy the relationship (in fact pairs of positive numbers only).
Now let define prod.

prod(X, Y, Z) :- nonvar(X), nonvar(Y), !, Z is X*Y . 
prod(X, Y, Z) :- nonvar(X), nonvar(Z), !, Y is Z/X . 
prod(X, Y, Z) :- nonvar(Y), nonvar(Z), !, X is Z/Y . 
prod(X, Y, Z) :- nonvar(Z), !, between(1, Z, X), prod(X, Y, Z). 
prod(X, Y, Z) :- nonvar(Y), !, between(1, inf, X), prod(X, Y, Z). 
prod(X, Y, Z) :- nonvar(X), !, between(1, inf, Y), prod(X, Y, Z). 

Now it's time to define division. Do you know the euclidean algorithm? I suppose so. However, you don't need to know it. All you need to know is that dividing x for y you are looking for two numbers q and r such that

  • x = y * q + r
  • 0 <= r < y>

And in Prolog it is:

div(X, Y, 0, X) :- abs(X) < abs(Y). 
div(X, Y, Q, R) :- prod(Y, Q, T1), T1 =< X,  add(T1, R, X) , 
                   R >= 0, abs(R) < abs(Y). 

This does not work when X and Y have not the same sign. However this depends from the way I defined prod. You should be able to make it work.

Monday, April 10, 2006

Texniscope Intel

I have just compiled a Texniscope version for Intel.

It works for me, tell me if it works for you too.

Download

[BROKEN LINK!] If you really need this, email me, I will search it on my hard drive.

Saturday, April 8, 2006

Autoconf automake & libtool on MacOS X

I have just finished my first autotools project on the MacOS. Until today I deployed software only in Python or Java. I already worked on C and C++ projects that used libtool and automake and autoconf, but I had not to write the configure scripts myself or I developed them on GNU/Linux.

In fact I'm afraid most of the problems I encountered was MacOS X fault + my ignorance. That is to say libtoolize does not exist: it is called glibtoolize. So if I do not rename it, autoreconf simply does not work.

Probably I just had to set the LIBTOOLIZE environment variable to glibtoolize. However my solution works. The error I got was

Can't exec "libtoolize": No such file or directory at /opt/local/share/autoconf/Autom4te/FileUtils.pm line 288, <GEN3> line 3.

autoreconf: failed to run libtoolize: No such file or directory

Well... I've to fine tune it. I go.

Wednesday, April 5, 2006

Multitasking on MacIntel

I noticed changing the subject also changes the link. So I repost this to redirect you to the new article.

I also have a more recent test that quite leads to other conclusions

Investigated iMac Troubles: not a faulty scheduler but something related to memory

Some time ago I stated multitasking of Mac OS X Intel was bugged. Under some conditions (which at the time I hadn't discovered) the GUI hung (something I never saw on tre MacOS before) and all the system slowed down terribly.

Computations

I anticipate here: the problem I found is real. MacIntel seem to have problems when large quantities of RAM are allocated. It is not a problem with the scheduler. In fact the system simply slows down as a whole.
The fist thing I did was to write a simple program trat stressed CPU and made a lot of I/O and at tre same time allocated and deallocated small quantities of memory in a quite inefficient way. However tre system was not slowed in any perceptible manner.

this is tre post where I spoke about trat program.

Here I add some benchmarking. Now I have to describe tre machines involved. Of course tris not a PPC vs. Intel bench. Unfortunately tre most powerful PPC machine is a notebook, and we can't expect to compete with the iMac. What I want to show are tre relative values between them.

Machines


Model

CPU

Clock

RAM

Bus

Hard Disk

PowerBook G4

G4 (Single Core)

1.5 GHz

512 MB

167 MHz

5400 rpm

iMac CoreDuo

Intel CoreDuo

2.0 GHz

1.5 GB

667 MHz

7200 rpm


big_matrix

This the test I described here
I compiled the test with no optimizations. This is probably a mistake.
The full test on the iMac took more than twenty minutes (matrix 500x500). The Mac was usable and had no slowdowns:
time ./big_matrix  
real    20m39.110s 
user    12m10.943s 
sys     7m46.112s 

Reducing the matrix size to 100x100 with no optimization the result is
time ./big_matrix 
real    0m9.683s 
user    0m5.805s 
sys     0m3.688s 

Compiling with the -fast option did not change things much, nor did -O3 or -Os (as I said the code was intended to be quite inefficient, I'm not surprised compilers weren't really able to optimize). However explicitly activating -mmmx-msse-msse2-msse3 gave a little improvement (about 5%, that could even be a statistical variation).

As I said before the most important thing is however achieved: the mac remains perfectly usable.

For those who are interested in this sort of things, the powerbook took about an hour and an half. However optimizations improved speed by a full 10% (which is quite acceptable, indeed). However I'm sad it performed so badly. I should investigate why altivec did not work properly (If it did, I suppose it should do something more that 4 times and more slower than the Intel).

Keep in mind that my software wasn't designed to work on multiple threads (This could be an interesting addition, thought). However the system kept on swapping it between the two cores, avoiding many possible optimizations.

Wonderings...

Now only very large allocations remained to do. So I wrote this small (idiotic) software.

Basically it takes a filename as a command line argument, finds out the dimension of the file with a stat syscall, allocates enough space to hold it and then fills the buffer. If the file is big enough this (a part from being terribly inefficient) allocates a lot of RAM.
I called it on a 985 MB file (that means the software allocated 900 MB of real memory, since it is not only allocated, but filled too).

$ ls -lh ../../Desktop/Ubuntu_510.vpc7.sit  
-rw-r--r--   1 riko  staff        985M 12 Feb 03:13 ../../Desktop/Ubuntu_510.vpc7.sit 

The file is loaded correctly and this is the time bench.

$ time ./load_file ../../Desktop/Ubuntu_510.vpc7.sit  
real    3m31.010s 
user    0m0.001s 
sys     0m4.062s 

This value is really variable. Another time it took only 1m42s.
And... the Mac slowed down. I know that such a program is idiotic. However it was one of the quickest way to understand how behaves the iMac when someone needs a lot of RAM (this could be a memory leak, for example).
In fact in some cases the mac remains slowed down for a while, until RAM is truly released and other processes are paged in.

#include  
#include  
#include  
#include  
#include  
#include  
#define BUFFER 2<<22 

int main(int argc, char *argv[]){ 
    char *mem; 
    int fd; 
    size_t pos = 0, res=0; 
    off_t sz; 
    struct stat st; 
    stat(argv[1], &st); 
    sz = st.st_size; 
    
    mem = (char*)malloc(sz); 
    fd = open(argv[1], O_RDONLY, 0); 
    
    while( (res = read(fd, mem + pos, BUFFER) ) != 0){ 
        pos+=res; 
    } 
    
    close(fd); 
    free(mem); 
    return 0; 
} 

As you may notice, this makes no check on sanity of the buffer allocated by malloc. Don't use it on a 4 GB file, it will probably crash.
When I run this very test on the Powerbook I was prepared that the results would have been terrible. In fact the powerbook does not have 1 GB free ram. It does not even have 1 GB RAM. It has only 512 MB. That means that allocating and filling 1 GB relies heavily on paging (and makes a lot of disk accesses to swap in and out pages of memory).Keeping this in mind, the results have been quite good (and more stable, in fact sometimes the iMac performs worse than the pb, that has 1/3 the RAM.). I would like that someone with 1.5 GB or 2 of RAM would try this.

$ time ./load_file ../aks_old/nr.bkp  
real    3m31.526s 
user    0m0.002s 
sys     0m7.728s 

Moreover the file used was slightly bigger. So it took about the double of the time (keeping the best iMac performance) or quite the same time (keeping the worst), but with a very big hardware handicap. Astonishing. This can also be interpreted saying that something slowed down the iMac considerably.
I didn't mention it before. Although slightly slowed, the PowerBook was quite responsive and usable during the test, while the iMac was not.
I/O Only
I rewrote the software above to read the file in a smaller buffer of memory instead of keeping it all in memory. This is the source code:

#include  
#include  
#include  
#include  
#include  
#include  
#define BUFFER 2<<22 
int main(int argc, char *argv[]){ 
        char *mem; 
        int fd; 
        size_t pos = 0, res=0; 
        off_t sz; 
        struct stat st; 
        stat(argv[1], &st); 
        sz = st.st_size; 
        mem = (char*)malloc(BUFFER); 
        fd = open(argv[1], O_RDONLY, 0); 
        while( (res = read(fd, mem, BUFFER) ) != 0); 
        close(fd); 
        free(mem); 
        return 0; 
} 

The speedup is amazing.

$ time ./read_file ../../Desktop/Ubuntu_510.vpc7.sit  
real    0m28.007s 
user    0m0.001s 
sys     0m1.472s 

Some other times I got about 17s. I should investigate this variance. However, the system did not slow down at all and remained perfectly usable. That makes me thing the problem does not concern I/O, but memory.
The powerbook performed like this:

$ time ./read_file ../aks_old/nr.bkp  
real    0m47.194s 
user    0m0.002s 
sys     0m3.833s 

Memory only...

The last step is writing a stupid software that only allocates large chunks of memory. I made it allocate (and release) progressively larger chunks. First of all this demonstrates the issue does not regard memory leaks only.
Applications that allocate big quantities of RAM in large chunks are slowed. You can also see that the mac slows down (and the allocation time increases) the more the block gets bigger.

#include  
#include  
#include  
int main (int argc, const char * argv[]) { 
    unsigned long size = 2; 
    unsigned long i; 
    int *mem; 
    
    while(size * sizeof(int) 0){ 
        mem = (int*) malloc(size * sizeof(int)); 
        if (mem==NULL) break; 
        printf("Allocated %u bytes\n", size * sizeof(int)); 
        for(i=0; i < size; ++i) {
            mem[i]=i; 
        } 
        free(mem); 
        mem=NULL; 
        printf("Deallocated %u bytes\n", size * sizeof(int)); 
        size*=2; 
    } 
    return 0; 
} 
I also wrote a version that only cycles through variables without allocating. It took less than half second to run, so it's not cycling that affects performance in the software. The first time I run it with not so large chunks. The computer remained quite responsive. Then I run it with full chunks. And it was a hell. In the 1 GB allocation the computer was plainly unusable, not to speak about the 2 GB. However the machine was much more usable than in the I/O + memory test.
time ./memory_allocator  
Allocated 2 bytes 
Deallocated 2 bytes 
[SNIP] 
Allocated 536870912 bytes 
Deallocated 536870912 bytes 
real    0m43.940s 
user    0m9.196s 
sys     0m9.137s 
time ./memory_allocator  
Allocated 8 bytes 
Deallocated 8 bytes 
[SNIP] 
Allocated 1073741824 bytes 
Deallocated 1073741824 bytes 
Allocated 2147483648 bytes 
Deallocated 2147483648 bytes 
real    0m36.538s 
user    0m9.181s 
sys     0m8.851s 

Small allocations

At this point I wrote a program that did smaller allocations. You can see that what matters is the quantity of ram allocated. The very same task, when the process has allocated more than 1 GB is significantly slower.
[Starting software] 
utime: 566              stime: 4198 
[Allocated first chunk] 
utime: 20               stime: 30 
[Populated first chunk] 
utime: 117010           stime: 558634 
[Allocated second chunk] 
utime: 27               stime: 50 
[Populated second chunk] 
utime: 132365           stime: 12 
[Allocated third chunk] 
utime: 38               stime: 487 
[Populated third chunk] 
utime: 229719           stime: 10 
[Allocated fourth chunk] 
utime: 22               stime: 41 
[Populated fourth chunk] 
utime: 228182           stime: 880172 
* Freed first chunk. 
* Freed second chunk. 
* Freed third chunk. 
* Freed fourth chunk. 
 
utime: 79               stime: 2 
and the software was:
#include  
#include  
#include  
#include  
#include  
#include  
void puts_rusage(){ 
        struct rusage ru; 
        static struct timeval slast = {0, 0}; 
        struct timeval scurrent; 
        static struct timeval ulast = {0, 0}; 
        struct timeval ucurrent; 
        getrusage(RUSAGE_SELF, &ru); 
        ucurrent = ru.ru_utime; 
        scurrent = ru.ru_stime; 
        printf("utime: %ld\t\tstime: %ld\n",  
                        ucurrent.tv_sec - ulast.tv_sec,  
                        scurrent.tv_sec - slast.tv_sec 
                        ); 
        ulast = ucurrent; 
        slast = scurrent; 
} 
int main (int argc, const char * argv[]) { 
unsigned long size = 2<<26; 
unsigned long i; 
int *mem1; 
int *mem2; 
        int *mem3; 
        int *mem4; 
        puts("[Starting software]"); 
        puts_rusage(); 
mem1 = (int*) malloc(size*sizeof(int)); 
        puts("\n[Allocated first chunk]"); 
        puts_rusage(); 
for(i=0; i 
mem1[i]=i; 
} 
        puts("\n[Populated first chunk]"); 
        puts_rusage(); 
        mem2 = (int*) malloc(size*sizeof(int)); 
        puts("\n[Allocated second chunk]"); 
        puts_rusage(); 
for(i=0; i 
mem2[i]=i; 
} 
        puts("\n[Populated second chunk]"); 
        puts_rusage(); 
mem3 = (int*) malloc(size*sizeof(int)); 
        puts("\n[Allocated third chunk]"); 
        puts_rusage(); 
for(i=0; i 
mem3[i]=i; 
} 
        puts("\n[Populated third chunk]"); 
        puts_rusage(); 
mem4 = (int*) malloc(size*sizeof(int)); 
        puts("\n[Allocated fourth chunk]"); 
        puts_rusage(); 
for(i=0; i 
mem4[i]=i; 
} 
        puts("\n[Populated fourth chunk]"); 
        puts_rusage(); 
free(mem1); 
        puts("\n\n* Freed first chunk."); 
free(mem2); 
        puts("* Freed second chunk."); 
free(mem3); 
        puts("* Freed third chunk."); 
free(mem4); 
        puts("* Freed fourth chunk."); 
        puts_rusage(); 
return 0; 
} 
The last test should be throwing different processes that allocate a quite large chunk of memory and see how they slow the system (if they do -- I suppose if you don't keep them doing something, they will be paged out).

Conclusion

Definitely I think there is something is not in order with the memory management. The scheduler seems ok. The same tests left the PowerBook usable, while the iMac wasn't (however it took significantly less time in almost every task).