Sunday, December 28, 2008

Missionaries, Cannibals and Prolog

Apparently, many generations of computer scientists have been concerned with the problem of letting cannibals and missionaries crossing a river with the very same boat.

Before them, logicians were looking for a way to have married couples doing the very same thing with slightly different inherent problems.

Centuries ago, the problem was that no woman should be with another man without her husband being present (this is now called a threesome). However, since women now are emancipated and nobody cares if a woman is with another man (with the possible exception of her own husband, especially when there are no football matches on TV), the cannibal problem seems more interesting.

It is widely accepted that if missionaries outnumber cannibals, they convert the cannibals. Since we believe in freedom of cult, no cannibal shall be converted. Someone may be familiar with another version of this problem, since once having people converted was considering a good thing, while having missionaries eaten was bad.

However, recently cannibals became vegans and until Churches will develop soy-missionaries the problem is how to let cannibals retain their own religion.

Basically, this is the job for a Prolog interpreter. Though humans solved this problem long ago, a human can only solve the problem for small numbers of cannibals and missionary (and though we hope both populations remain small in number, cannibals can always mate and missionaries can convert new people).

This is a short Prolog program that given an initial state (all cannibals and all missionaries on the same side of the river) and the desired final state (all cannibals and missionaries on the other side of the river), computes the strategy our poor bootman has to pursue in order to carry over the whole lot of people.

initial_state(mc, mc(left, bank(3,3), bank(0,0))).
final_state(mc(right, bank(0,0), bank(3,3))).

Above we say that the initial state has the boat on the left, three cannibals and three missionaries are on the left bank and no one is on the other one. The final state encodes what we have already said.

Now we have to describe possible moves:

move(mc(left, L, _R), Boat) :-
    choose_passengers(L, Boat).

move(mc(right, _L, R), Boat) :-
    choose_passengers(R, Boat).

choose_passengers(bank(C, _M), boat(2, 0)) :- 
    C > 1.
choose_passengers(bank(_C, M), boat(0, 2)) :- 
    M > 1.
choose_passengers(bank(C, M), boat(1, 1))  :- 
    C > 0, M  > 0.
choose_passengers(bank(C, _M), boat(1, 0)) :- 
    C > 0.
choose_passengers(bank(_C, M), boat(0, 1)) :- 
    M > 0.

Basically, when the boat is on the left bank, we choose passengers from the left bank. When the boat is on the right bank, we choose passengers from the right bank. We can only take 2 people on the boat. That means that, provided there is enough people on the bank, we can only carry 2 cannibals, 2 missionaries, 1 cannibal, 1 missionary or 1 cannibal and 1 missionary. Since the boatman is usually agnostic, the cannibal aboard with one missionary is not going to be converted.

The Prolog system choses one of the possible moves and proceeds. If the move is wrong, it back-tracks and tries another solution. Until the number of choices is small, it is rather efficient.

The predicates that describe how the system changes when one move is performed are:

update(mc(B, L, R), Boat, mc(B1, L1, R1)) :-
    update_boat(B, B1),
    update_banks(Boat, B, L, R, L1, R1).

update_boat(left, right).
update_boat(right, left).

update_banks(alone, _B, L, R, L, R).
update_banks(boat(C, M), left, 
             bank(LC, LM), bank(RC, RM), 
             bank(LC1, LM1), bank(RC1, RM1)) :-
    LM1 is LM - M,
    RM1 is RM + M,
    LC1 is LC - C,
    RC1 is RC + C.

update_banks(boat(C, M), right,
             bank(LC, LM), bank(RC, RM), 
             bank(LC1, LM1), bank(RC1, RM1)) :-
    LM1 is LM + M,
    RM1 is RM - M,
    LC1 is LC + C,
    RC1 is RC - C.

Eventually, we have to tell legal and illegal configurations apart. Basically, if in the bank we leave there are more missionaries than cannibals, we are breaking the law. If not, everything is fine (since the boatman works for free, so he does not have to pay taxes).

legal(mc(left, _L, R)) :-
    \+ illegal(R).

legal(mc(right, L, _R)) :-
    \+ illegal(L).

illegal(bank(C, M)) :-
    M > C.

The solution algorithm is trivial:

solve_dfs(State, _History, []) :-
    final_state(State).

solve_dfs(State, History, [Move|Moves]) :-
    move(State, Move),
    update(State, Move, State1),
    legal(State1),
    \+ member(State1, History),

solve_dfs(State1, [State1|History], Moves).

print_moves([]).
print_moves([M|Ms]) :-
    writeln(M),
    print_moves(Ms).

go :-
    initial_state(mc, State),
    solve_dfs(State, [State], Moves),
    print_moves(Moves).

We generate new moves and check if the updated state would be legal and if we did not get in the same state before (if it happens we are looping, the boatman becomes hungry and eats whoever is on his boat -- that’s the main reason why he does not travel with the empty boat).

The full source is here:

Tuesday, December 23, 2008

Using template programming for efficiency reasons

Suppose we want to write a

unsigned int pow(unsigned int n, unsigned int exp) 

function which computes the power of a given number.

A C implementation is rather trivial:

unsigned int
traditional_pow(unsigned int n, unsigned int exp) {

        unsigned int value = 1;

        for(unsigned int i = 0; i < exp; ++i) {

                value *= n;

        }

        return value;
}

Basically we could write the very same thing in C++. We could also perform some minor optimizations in the code in order to make it look more efficient. However, compilers are quite good in optimizing code. Indeed, they are better than humans.

Nonetheless, that function contains jumps. Notice that even though the exp parameter is known at compile time (which is a pretty common use-case), there is no way for the compiler to kill jumps short of doing a pretty expensive and thorough inter-procedural analysis. Which is not performed by gcc even using -O3.

The only result is having traditional_pow inlined in main. This is rather unsatisfactory: branching is among the more costly operations in modern CPU's.

We could try a recursive implementation:

unsigned int
traditional_rec_pow(
        unsigned int n,
        unsigned int exp,
        unsigned int acc) {
        switch(exp) {
                case 0: return 1;
                case 1: return acc;
                default:
                return traditional_rec_pow(n, exp-1, acc*n);
        }
}

unsigned int
traditional_rec_pow(unsigned int n, unsigned int exp) {
        return traditional_rec_pow(n, exp, n);
}

Here we are using an accumulator (a standard functional programmer's trick) in order to make the function tail recursive. With mild optimizations turned on, code generated for traditional_rec_pow looks like a less optimized version of traditional_pow (tail recursion is eliminated, though). Full optimizations yield a 74 lines long version of the function, where some special cases have been optimized.

We are very far from the intuitive idea of full optimization of pow if exp is known at compile time. We think that the compiler should be able to produce the very same code it would for the expression:
n * n * n * n * n * n

in case exp==5.

I promise you: we can do that.
I wrote the recursive functional version as an introduction to a templatic variant. The idea is that if exp is known at compile time, it can become a template parameter.

The structure remains similar, with template specialization used instead of explicit switch/if. This is even more readable for a functional programmer!



template<unsigned int exp> inline unsigned int
pow(unsigned int n, unsigned int acc) {

        return pow<exp-1>(n, acc*n);

}

template<> inline unsigned int
pow<1>(unsigned int n, unsigned int acc) {
        return acc;
}

template<> inline unsigned int
pow<0>(unsigned int n, unsigned int acc) {
        return 1;
}

template<unsigned int exp> unsigned int
pow(unsigned int n) {
        return pow<exp>(n, n);
}

Note that we give special cases for exp==1 and exp==0. They are needed because otherwise compilation would not terminate (well, it terminates with an error for implementation reasons).

Here some computation is implicitly performed at compile time. Suppose in the body of the main we call

pow<3>(3);

With no optimizations, the compiler generates some functions. There is the pow<1> specialization (which basically returns acc). Acc is the second parameter and according to OS X intel conventions that is 12(%ebp), while the return value is in %eax:

.globl __Z3powILj1EEjjj
                .weak_definition __Z3powILj1EEjjj
        __Z3powILj1EEjjj:
        LFB13:
                pushl        %ebp
        LCFI0:
                movl        %esp, %ebp
        LCFI1:
                subl        $8, %esp
        LCFI2:
                movl        12(%ebp), %eax
                leave
                ret
        LFE13:
                .align 1

For each exp value greater than one a function is generated. Each of this functions calls the one "before":

.globl __Z3powILj2EEjjj
                .weak_definition __Z3powILj2EEjjj
        __Z3powILj2EEjjj:
        LFB19:
                pushl        %ebp
        LCFI3:
                movl        %esp, %ebp
        LCFI4:
                subl        $24, %esp
        LCFI5:
                movl        12(%ebp), %eax
                imull        8(%ebp), %eax
                movl        %eax, 4(%esp)
                movl        8(%ebp), %eax
                movl        %eax, (%esp)
                call        __Z3powILj1EEjjj
                leave
                ret
        LFE19:
                .align 1
        .globl __Z3powILj3EEjjj
                .weak_definition __Z3powILj3EEjjj
        __Z3powILj3EEjjj:
        LFB18:
                pushl        %ebp
        LCFI6:
                movl        %esp, %ebp
        LCFI7:
                subl        $24, %esp
        LCFI8:
                movl        12(%ebp), %eax
                imull        8(%ebp), %eax
                movl        %eax, 4(%esp)
                movl        8(%ebp), %eax
                movl        %eax, (%esp)
                call        __Z3powILj2EEjjj
                leave
                ret
        LFE18:
                .align 1

Notice that __Z3powILj3EEjjj(pow<3>) calls __Z3powILj2EEjjj(pow<2>), which calls __Z3powILj1EEjjj(pow<1>). A part from this, most instructions deal with the stack (in order to create and destroy the current function stack frame) and multiply the first parameter for the second one, passing the result to the subsequent call.

Notice that no function contains conditional operations, tests or jumps a part from the call to the subsequent function in the chain. This is the kind of things optimizers shine on.

Indeed, even -O does the right thing. All the functions are inlined and the generated code is (not considering stack operations):
movl        8(%ebp), %eax
movl        %eax, %edx
imull        %eax, %edx
imull        %edx, %eax

Wow! This is how we would have written by hand. Notice that if we have multiple pow<n> calls, specialized ( and optimized code) is generated for each variant. For example this is for pow<11>


.globl __Z3powILj11EEjj
                .weak_definition __Z3powILj11EEjj
        __Z3powILj11EEjj:
        LFB17:
                pushl        %ebp
        LCFI0:
                movl        %esp, %ebp
        LCFI1:
                movl        8(%ebp), %eax
                movl        %eax, %edx
                imull        %eax, %edx
                imull        %eax, %edx
                imull        %eax, %edx
                imull        %eax, %edx
                imull        %eax, %edx
                imull        %eax, %edx
                imull        %eax, %edx
                imull        %eax, %edx
                imull        %eax, %edx
                imull        %edx, %eax
                leave
                ret