Sunday, April 23, 2006

Quicksort (C and Python)

The quick-sort algorithm

Quick-sort is a sorting algorithm developed by Hoare that in average does O(n log n). The O() notation means that if the list length is n, the number of operations used to sort the list is in the order of magnitudo of n log n (in this case).

To be more formal, we shall say that there exist two constants c1 and c2 such that the number of operations m isc1 * n * log n <= m <= c2 * n * log n. Notice that c1 and c2 are not specified in the O notation
We stop with math here. If you want more precise informations about the O notation read the wikipedia.

Now I'm going to briefly explain the quick-sort algorithm, and then we will use that algorithm to exploit the expressive power of some programming languages. Keep in mind that more expressive does not necessarily mean faster to run. It means faster to write. But being faster to write means that we can try more efficient algorithms, thus improving performance in a more effective way.
Expressive languages are usually high-level. In fact we can also profile the code, track the bottleneck and rewrite only a couple of functions in C or in ASM. This lead to fast programs with fast development cycles.

Quick-sort strategy

Quick-sort uses a divide and conquer approach. We divide a list in two sublists using a pivot element. All the elements less than the pivot come before it, those greater after. Then we recursively apply quick-sort to the two sublists.
Notice that the pivot is chosen arbitrarily. Some implementations chose the first element of the list. Some chose it randomly, some use other criteria.
Moreover there are implementations of quick-sort that sort the list in place (that is to say memory used is bounded), some need extra storage. Guess which are the more efficient.

On wikipedia you can find more informations about the quick-sort algorithm. You can find also some implementations, and a link to a page full of implementations in different languages.

I almost did not write code. I took code from the wikipedia or around the web and I'm going to comment it. This is the "new" work I did. Comments, explanations, not code. I also wrote a couple of implementations, however I don't remember which ones.

Python

2.4.2 (#1, Mar 23 2006, 22:00:44)  
[GCC 4.0.1 (Apple Computer, Inc. build 5250)] 

Now lets examine a Python high level implementation.

def qsort(L): 
    if L == []: return [] 
    return qsort([x for x in L[1:] if x< L[0]]) + L[0:1] + \ 
            qsort([x for x in L[1:] if x>=L[0]]) 

This is the exact description of the quick-sort algorithm, if you know what Python list comprehension are. When I write [x for x in L if p(x)], Python computer the list of elements of the L container that make p(x) true.

L[a:b] is the sublist of L[L[a], L[a+1], ..., L[b-1], L[b]]. L[a:]==L[a:len(L)], L[:b]==L[0:b]

[x for x in L[1:] if x< L[0]] is the sublist of the elements of L a part from the first one (that is the pivot) such that x is less than the pivot (L[0]).

[x for x in L[1:] if x>=L[0]] is the sublist of the elements of L a part from the first one (that is the pivot) such that x is greater or equal than the pivot (L[0]).

Then we pass those sublists to qsort that recursively sorts them, and eventually we return a append all those sublists and return them.

This implementation, although beautiful to read and understand is horrible from a performance point of view. It creates, concatenates and destroys a large number of lists. Moreover it is purely recursive. For large lists, we can even exhaust the stack space.

A version of quick-sort in Python that has no need for more memory is this:

def partition(array, begin, end, cmp): 
    while begin < end: 
        if cmp(array[begin], array[end]): 
            (array[begin], array[end]) = (array[end], array[begin]) 
            break 
        end -= 1 
    while begin < end: 
        if cmp(array[begin], array[end]): 
            (array[begin], array[end]) = (array[end], array[begin]) 
        break 
        begin += 1 
    return begin 
    
    def sort(array, cmp=lambda x, y: x > y, begin=None, end=None): 
        if begin is None: begin = 0 
        if end   is None: end   = len(array) 
        if begin < end: 
            i = partition(array, begin, end-1, cmp) 
        sort(array, cmp, begin, i) 
        sort(array, cmp, i+1, end) 

[NOTE: code has been reformatted without running it again: it is likely
to be wrong!]

If you want to run this code, remember that in Python indentation matters. If you just copy and paste the code, you may end with unusable code.
Anyway, I benchmarked all of them. "Elegant quicksort" is the first implementation. "Standard quicksort" is the latter. "Bultin sort" is standard python sort algorithm (that is not a quicksort, but a dedicated sorting algorithm).

Elegant qsort: [21.20, 19.75, 20.08, 20.46]
Builtin qsort: [1.61, 2.29, 2.29, 2.29]
Standard qsort: [31.15, 30.60, 30.68, 30.38]

Probably due to heavy optimization performed in list comprehension the "elegant" algorithm is faster than the standard one. [from the future: I hardly believe that]. However both of them are more than ten times slower than the python built-in sort algorithm.
We have learnt:

  1. High level languages are great to express algorithms in a readable way (and we can use them to test them and see how they work and all)
  2. You should not implement "basic" algorithms in high level languages: use their built-ins or write them in C (most built-ins are written in C).

C

Now it is time for a couple of pure C quick-sort:

void qsort1(int a[], int lo, int hi ){ 
    int h, l, p, t; 
    if (lo < hi) { 
        l = lo; 
        h = hi; 
        p = a[hi]; 
        do { 
            while ((l < h) && (a[l] <= p))  
                l = l+1; 
            while ((h > l) && (a[h] >= p)) 
                h = h-1; 
            if (l < h) { 
                t = a[l]; 
                a[l] = a[h]; 
                a[h] = t; 
            } 
        } while (l < h); 
        t = a[l]; 
        a[l] = a[hi]; 
        a[hi] = t; 
        qsort( a, lo, l-1 ); 
        qsort( a, l+1, hi ); 
    } 
} 

Another implementation is:

void quicksort(int x[], int first, int last) { 
    int pivIndex = 0; 
    if(first < last) { 
        pivIndex = partition(x,first, last); 
        quicksort(x,first,(pivIndex-1)); 
        quicksort(x,(pivIndex+1),last); 
    } 
} 

int partition(int y[], int f, int l) { 
    int up,down,temp; 
    int piv = y[f]; 
    up = f; 
    down = l; 
    goto partLS; 
    do {  
        temp = y[up]; 
        y[up] = y[down]; 
        y[down] = temp; 
        partLS: 
        while (y[up] <= piv && up < l) { 
         up++; 
        } 
        while (y[down] > piv  && down > f ) { 
            down--; 
        } 
    } while (down > up); 
    y[f] = y[down]; 
    y[down] = piv; 
    return down; 
} 

C also have a builtin quick-sort, that is much more advanced than this ones, since it allows to specify a comparison function, thus working with all kinds of data types.

Benchmarking these was hard. They too fast to be benchmarked with a list of a million elements and the standard posix clock function. However if you want to run the bench yourself, I could not use MacOS X specific bench functions.
So I used a list of 100000000 elements.
This is the (ugly) code I used for benchmarking. The test is not really interesting. Of course since C integers are machine integers it is dramatically faster than other solutions. It could have been more interesting if we sorted other data structures, where the gap could be reduced.

#include <stdlib.h> 
#include <stdio.h> 
#include <time.h> 
#include "qsort1.h" 
#include "qsort2.h" 
#define LEN 100000000 

int intcmp(const void *a, const void *b){ 
    return *(int*)a - *(int*)b; 
} 

int main(){ 
    size_t index; 
    int *l = malloc(sizeof(int)*LEN); 
    clock_t last, current; 
    srand(time(NULL)); 
    for(index = 0; index < LEN; ++index){ 
        l[index] = rand(); 
    } 
    
    last = clock(); 
    qsort(l,LEN, sizeof(int), intcmp); 
    current = clock(); 
    printf("C qsort: %f\n", (current-last)/(float)CLOCKS_PER_SEC); 
    
    for(index = 0; index < LEN; ++index){ 
        l[index] = rand(); 
    } 
    last = clock(); 
    qsort1(l,0, LEN-1); 
    current = clock(); 
    printf("int qsort: %f\n", (current-last)/(float)CLOCKS_PER_SEC); 
    for(index = 0; index < LEN; ++index){ 
        l[index] = rand(); 
    } 
    last = clock(); 
    quicksort(l,0, LEN-1); 
    current = clock(); 
    printf("qsort 2: %f\n", (current-last)/(float)CLOCKS_PER_SEC); 
    free(l); 
    return 0; 
} 

No comments: