RiK0 Tech Temple: November 2008

Sunday, November 9, 2008

From foldl to difference lists

Most declarative languages have lists. Usually this lists are either a nil atom (empty list) or a cons (a couple whose second element is a list). It is easy to see how we can append things to the head of the list.

In my former article we explored lists and foldl. Indeed, it is clear that we need an accumulator because appending at the end of the list is basically not possible (with decent performance). Appending at the end of the list means copying the whole list. Why? Consider how lists are done.

You have the head (say X1) and a pointer to another list. And this other list is built in the very same way. The last element has a pointer to an empty list.

If we want to add something at the end (even though we had a magical last::[E] --> E
function which got to the last element of a list in O(1) time) we need to "modify" the last element in the process. Since modifying is not allowed, we need to create another last element whose car (first element) is 3 (following our example) and whose second element (cdr) is a cons(4, []). Of course, at this point we should modify another element in the list. Basically we need to create a copy of the list. That's O(N).

However, if we bound the cdr of the last element not to an empty list but to an "unbound variable" we could simply add at the end cons(4, Unbound2), binding the Unbound1 to cons(3, Unbound1). Amazing... there are two major drawbacks.

First, we need logical variables. Which is simply true if we are using Prolog or Oz (sorry, no Erlang this time). In Erlang/Haskell you just can't do almost anything with unbound variables. In Prolog you can.

The second problem is our magical last function. We can build such a function, though it runs O(N). We can't do better. That means that appending at the end of a list, even though we left the tail of the list unbound, costs a lot. Well, usually O(N) is good, the point is that this operation is very low level and if we got a lot of O(N) stuff in inner "loops" the algorithm will suck.

Now, let us get back to the reverse with accumulator.

standard_reverse(Xs, Ys) :- standard_reverse(Xs, [], Ys).
standard_reverse([X|Xs], Acc, Ys) :-
    standard_reverse(Xs, [X| Acc], Ys).
standard_reverse([], Ys, Ys).

Let us change the argument order:

standard_reverse2([X|Xs], Ys, Acc) :-
    standard_reverse2(Xs, Ys, [X| Acc]).
standard_reverse2([], Ys, Ys).

Ok, this change seems unimportant to say the least. Let us do a further modification: instead of using two arguments for the reverted list and for the accumulator, let us use one single argument which includes both. For example Reverted-Acc.

diff_reverse([X|Xs], Ys-Acc) :-
    diff_reverse(Xs, Ys-[X|Acc]).
diff_reverse([], Ys-Ys).

Extremely nice! Here we have difference lists. Difference lists are a subject difficult to master. Their declarative meaning is quite obvious (and I will explain it in the next paragraphs), however their procedural/operational meaning is quite messy.

Basically a difference list is a way to represent an ordinary list. For example the list [1, 2, 3] can be represented by the difference lists [1, 2, 3, a]-[a] or by the difference list [1, 2, 3, a, 4]-[a, 4] or by [1, 2, 3]-[] or in another thousand of different ways, the most interesting of which is [1, 2, 3|X]-X. Notice that the latter unifies which any of the others. The careful reader may have noticed that now we may have a way to get to the last element of the list in order to add stuff in O(1). But let's not jump to conclusions.

Any ordinary list L can be represented as L-[], so it's easy to build a difference list representing an ordinary list. Unfortunately this difference lists is fully specified (if L was).

A classical problem in Prolog is appending two lists. This should be an O(M) operation, where M is the length of the shortest list. In fact in Prolog it is O(N) where N is the length of the first list:

myappend([], Xs, Xs).
myappend([H|T], Xs, [H|Ys]) :-
    myappend(T, Xs, Ys).

Now, let's benchmark the thing:

?- time(myappend([1, 2, 3], [2, 5], L)).
% 4 inferences, 0.00 CPU in 0.00 seconds (0% CPU, Infinite Lips)
L = [1, 2, 3, 2, 5].

?- time(myappend([1, 2, 3, a, b, c], [2, 5], L)).
% 7 inferences, 0.00 CPU in 0.00 seconds (0% CPU, Infinite Lips)
L = [1, 2, 3, a, b, c, 2, 5].

However, we can do that in O(1) using difference lists:

append_dl(Xs-Ys, Ys-Zs, Xs-Zs).
?- time(append_dl([1, 2, 3, a, b, c|Xs]-Xs, [2, 5]-[], Z)).
% 1 inferences, 0.00 CPU in 0.00 seconds (0% CPU, Infinite Lips)
Xs = [2, 5],
Z = [1, 2, 3, a, b, c, 2, 5]-[].

When applicable, this is very nice. There is not much more on difference lists, still having the ability to add elements at the end of a structure is great. How? Think about it:

push_back(X, L-[X|Xs]).
?- X = [1, 2|Xs]-Xs, time(push_back(3, X)).
% 1 inferences, 0.00 CPU in 0.00 seconds (0% CPU, Infinite Lips)
X = [1, 2, 3|_G305]-[3|_G305],
Xs = [3|_G305].

Now we know how to put new elements to the front or to the back of a difference lists. That means we can build a deque.

Saturday, November 8, 2008

Accumulators and Foldl in declarative languages

Using accumulators in declarative programming is common. Basically it is possible to convert not tail recursive functions into tail recursive ones using an accumulator.
Consider a very simple procedure: sum numbers in a list. The naive implementation is:

naive_sum([]) -> 0;
naive_sum([H|T]) ->
    H + naive_sum(T).

This one is not tail recursive. Every time the second clause matches, naive_sum is called and the return value of the recursive call is needed in order to compute the result. The standard way to write this one is:

standard_sum(L) ->
    standard_sum(L, 0).
standard_sum([], Acc) -> Acc;
standard_sum([H|T], Acc) ->
    standard_sum(T, Acc+H).

Notice that this way the last operation of each recursive calling clause is the recursive clause. This means that the new call can take the place of the old one. Simple. It’s so simple (and the pattern is so common), that it has a name foldl.

.
sum(L) ->
    lists:foldl(fun(X, Y) -> X + Y end, 0, L).

Basically, it's one of the primitives of functional languages. For example, in Haskell, it would be:

mysum lst = foldl (+) 0 lst

However, an Haskell programmer would write. This trasformation is known as eta-reduction.

mysum = foldl (+) 0

In Prolog, the naive and the "standard"sum becomes:

naive_sum([], 0).
naive_sum([H|T], Res) :-
    naive_sum(T, Res1),
    Res is Res1 + H.
standard_sum(L, Res) :-
    standard_sum(L, 0, Res).
standard_sum([], Acc, Acc).
standard_sum([H|T], Acc, Res) :-
    Acc1 is H * Acc,
standard_sum(T, Acc1, Res).

Since Prolog clauses can't return a value (because they are not functions, of course!), we need an extra argument which unifies with the result. Besides, in the Prolog version it is quite explicit that naive_sum can't be tail recursive. The difference between the naive and the tail recursive version is that the first one uses linear (O(N)) space and the latter uses constant (O(1)) space. Both are O(n) in terms of running time.
Another typical example on accumulators use is the "revert list" in Prolog. This is the naive version:

naive_revert([], []).
naive_revert([H|T], L1) :-
    naive_revert(T, T1),
append(T1, [H], L1).

This is both not tail recursive and has a worse than expected running time: indeed it is O(N^2) while it should be O(N).
The first solution is using the built-in reverse predicate. It is even the right thing to do in production, but it is not useful to learn. :)
Let’s examine the following code:

standard_reverse(Xs, Ys) :- standard_reverse(Xs, [], Ys).
standard_reverse([X|Xs], Acc, Ys) :-
    standard_reverse(Xs, [X| Acc], Ys).
standard_reverse([], Ys, Ys).

Once again, we got tail recursion (using an accumulator). Basically we read the first element in the first list and we push it in the accumulator. When we take another element this one is pushed again (and the former is after this one). Then, when we finish the list, we unify the accumulator with the last value. This version is O(n) in time and space.
The Erlang version is extremely similar (and slightly more readable):

standard_revert(L) ->
    standard_revert(L, []).
standard_revert([X|Xs], Acc) ->
    standard_revert(Xs, [X|Acc]);
standard_revert([], Acc) -> Acc.

We can explore this definition further:

rev(_Op, [], Acc) -> Acc;
rev(Op, [H|T], Acc) ->
    rev(Op, T, Op(Acc, H)).
strange_revert(L) ->
    rev(fun(A, B) -> [B|A] end, L, []).

Here, instead of using the [X|Xs] constructor explicitly, we moved it in anonymous function. Notice that we also reverted the parameters. That is to say we don’t have a cons function:

Cons = fun(H, T) -> [H|T] end

but a :

Flipped_Cons = fun(T, H) -> [H|T] end

This is not unexpected. Let’s consider the definition of the foldl function itself:

foldl(_F, Acc, []) ->
    Acc;
foldl(F, Acc, [H|T]) ->
foldl(F, F(Acc, H), T).

They look very similar. Indeed they have almost the very same shape. We just have to flip a couple of parameters (foldl has the accumulator as the second parameter, not as the third one). We must notice that the foldl above is not akin Erlang foldl. In Haskell the type of foldl is:

Hugs> :type foldl
foldl :: (a -> b -> a) -> a -> [b] -> a

The binary operation has the accumulator as the first parameter (here it has type a), which is the same convention we adopted for the handmade foldl written above (from now on our:foldl). Erlang standard foldl follows another convention, the binary function has the accumulator as the second parameter.

foldl(Fun, Acc0, List) -> Acc1
Types  Fun = fun(Elem, AccIn) -> AccOut
Elem = term()
Acc0 = Acc1 = AccIn = AccOut = term()
List = [term()]

If we want to use foldl to build the reverse function we must chose which foldl to use. our:foldl needs the Flip (Flip_Cons) operation, while the standard Erlang foldl is fine with a Cons operation.

foldl_revert(L) ->
    lists:foldl(fun(H,T) -> [H|T] end, [], L).
foldl_revert2(L) ->
    our:foldl(fun(T,H) -> [H|T] end, [], L).

And here we are foldling in Prolog:

foldl(Functor, Acc, [H|T], Foldled) :-
call(Functor, Acc, H, Acc1),
foldl(Functor, Acc1, T, Foldled).
foldl(_Functor, Acc, [], Acc).
cons(T, H, [H|T]) :- !.

Thursday, November 6, 2008

Extol of Declarativeness

Far too many programmers underestimate the beauty of declarative programming. This is something which cames from the industry: functional languages are traditionally under-employed and perceived as academic choices. In fact a lot of research has been done with Lisp and Prolog (mostly in U.S.A. and Europe respectively) in AI and formal languages fields.

However, declarative languages are very suited for the industry and to solve practical problems. Indeed, one of the most used domain specific languages (SQL) is a declarative language (even though not a turing complete one). Nobody thinks that SQL is academic. It’s just a good language for the task (and the underlying mathematical model).

Indeed, Prolog would be suited for the task as well (and Datalog is a proof), was it not turing complete (indeed, termination is a very good property to have in a query and data manipulation language.

Programming without explicit state is a nice property. A typical example is transactional semantics: in imperative languages implementing transactional behaviour is quite a hard task. Every operation must be logged (or check-pointed or both). Another option is some kind of shadow copy behaviour. Basically you access data through a pointer, which points to some kind of structure. If you want to edit the structure, you make a copy of it, edit the copy and set the pointer to point the copy if and only if every operation in the transaction has success.

All these operations are costly. Basically this kind of behaviour is free in functional languages. You return from a function and if you had no explicit side effects (which is not possible in haskell and easily avoidable inpr other functional languages) you automatically roll back to the state before the function call. This behaviour has some runtime cost, and this is the reason why functional languages are usually quite fast and not extremely fast (especially in certain tasks). However, its both easy to code (in fact you do not have to do anything explicitly) and rather optimized, since it’s done by the system (which should be rather efficient).

On the other hand, some other operations are quite more costly in functional environments. This happens when you often modify structures: that is to say when the algorithm uses on destructive write operations. Sometimes, declarative languages have limited support for “classical” variables: think about Oz cells, Lisp or OCaml. Erlang has ets, dets and mnesia tables. However, these algorithms are usually rather tricky to parallelize. In the most lucky environment some data decomposition comes natural, unfortunately this is not always the case. On the other hand, in declarative environments parallelism is almost free.

In Erlang parallelism is build into the very language core. In Haskell writing concurrent software is extremely easy (or at least not any more difficult than writing non-parallel software). Oz, with its dataflow variables, is also an interesting choice (and presents a concurrency model extremely educative and different from the ugly thread/synchronized Java vision). Basically in Oz variables themselves can be used to let threads communicate.

In many declarative languages variables can be bound or unbound. Unbound variables shall be bound before being used. That is to say that reading an unbound variable is a programming error (however, this does not go unnoticed and is quite different from reading an uninitialized variable/pointer). In Oz, if you have many threads and a thread tries to read an unbound variable, it hangs until someone binds that variable.

This is a nice way to do concurrency: variables are used. They can be used to build barriers, for example. They are a good IPC primitive as well. Since in Oz variables are logical variables (read-only, once bound), there is no need to worry about multiple writes. Nice.

Saturday, November 1, 2008

Do not program defensively. (Sure?)

How it is true that pragmatics matters in programming! That is to say, best practices in a given environment/platform may be bad practices in another. For example the following lines are quoted from Erlang's Programming Rules as recommended from Ericsson:

A defensive program is one where the programmer does not “trust” the input data to the part of the system they are programming. In general one should not test input data to functions for correctness. Most of the code in the system should be written with the assumption that the input data to the function in question is correct. Only a small part of the code should actually perform any checking of the data. This is usually done when data “enters” the system for the ﬁrst time, once data has been checked as it enters the system it should thereafter be assumed correct.

There is a whole generation (more likely a couple or more) of C programmers grown with the idea of defensive programming. Which may be a good idea, indeed, especially considering the quite large bug number of C programs. Java programmers more or less share the same idea (further misled by a wrong idea of typing, but that's entirely another story).

Here, we find that defensive programming not only is not considered a good practice: it's even considered a bad practice. Indeed, the Unix operating system, about 30 years ago used a similar idea: less code to manage erroneous conditions (in comparison with Multics). Kernel Panic and reboot. Of course there is quite a difference between "not recovering from errors" and "not doing defensive programming".

The point is that well written code can't be put in a "wrong" state by a wrong input. At least it should not. A typical example is a buffer overflow: as far as I'm concerned, a legitimate behaviour is a core dump (as long as the dump is not user accessible, of course!). The wrong behaviour is corrupting memory (which is C behaviour out of the box).

Back to Erlang, the example code of the section is:

%% Args: Option is all|normal
get_server_usage_info(Option, AsciiPid) ->
Pid = list_to_pid(AsciiPid),
case Option of
all -> get_all_info(Pid);
normal -> get_normal_info(Pid)
end.

Which is quite explicative: if Options is neither all nor normal, Erlang "crashes", it does not do something implementation dependent or lets the user crack the process. Besides, Erlang was created from real needs from the industry, thus is a practical language suited to solve practical problems (and is also a beautiful functional language).

I want to stress the industrial nature of Erlang to show that the advise does not come from academics or inexperienced programmers unaware of "real world" needs. Maybe there is still hope.