Thursday, July 21, 2011

Is state really bad?

Prologue

Actually this post was being written in March, when some facts referred in the body of the article actually happened. I am resuming the post today, after months of extensive (and rather unbloggable) work.

Intro

Stateful programming is somewhat the default strategy in the industry nowadays and it has been in the last 60 years. If we consider the 10 more widespread languages in the tiobe index (a statistic is worth another), they are all imperative languages strongly based on stateful programming. This month Lisp is has enormously improved (Land of Lisp anyone?), but even if we would consider Common Lisp and Scheme together, it would not go in the first 10 positions (even if just for a few points).

The Haskell case

In fact, in the last years we have witness an increase of interest in functional languages. As far as I remember, about 5 years ago there was a lot of hype about Haskell (which was in its 30's or something like that). It was related, as far as I remember, in a famous user-friendly linux distro having written some tools in Haskell and in the Darcs project. Darcs has almost disappeared with Git (Perl+C) and Mercurial/Bazaar (Python, Python+C) sharing the market of distributed versioning tools.

The case is rather interesting: Darcs was built with a very strong "theory" behind, the algebra of patches. The problem is that the merge process is still exponential in the worse case and that it is rather buggy. It is interesting that the software is both slow and bug ridden. It is tempting to infer something about Haskell, at least to counter the hypothesis that pure functional programming makes it relatively easy to write bug-free programs. Perhaps this should be taken into account the next time someone looks down on setf. Well, I do like Haskell, just not as much as I love Lisp.

The last thing I would like to mention is that I like the idea to insulate imperative code from functional/algorithmic code (but this is something I suggest with every language). However being forced to do that is not necessarily a good idea. In fact, it makes Haskell quite harder to pick up and has the paradoxical effect that if you are a developer good enough to code in Haskell, you probably already know how to separate side-effected code from side-effect free code. Besides, although I like purity (to some extent), I'm not sure whether state monads should be considered stateless. I mean, a state monad implements state in a nice functional construct, mathematically sound and pure. In fact they are almost seen as an executable transaltion of the declarative semantics of a stateful program. Consequently, I'm not sure whether frequent use of state monads is really an improvement over carefully used stateful programming.

In other words, although I admit my limited experience of Haskell in real world projects, I believe that if we think functionally the code is going to be clear (but it will be as clear in Haskell as in Lisp). On the other hand, if we think in a heavily stateful way, our code may be less clear in Lisp and quite messy in Haskell, where we would implement our (wrong) thinking having state monads pop out everywhere. Feel free to correct me, possibly with real experience and examples.

Erlang and Clojure

Erlang is different from Haskell in that functional stateless programming is required for practical purposes. In order to allow shared-nothing process based architecture (and the actor model), state is essentially banned. Of course, there are solutions which bring back state (ETS, DTS, plus various libraries such as mnesia). It is worth noting that this kind of state is generally different from usual imperative state.

In imperative languages everything is stateful by default. On the other hand, in this case, we basically have only "db like" solutions. This does not encourage real stateful programming, and in fact, usual Erlang programs are pretty much state free. The essential different is that they are not side-effect free.

Being state free and side effect free are two very different things. Having no side effects is pure functional programming. And is extremely hard to do. In fact, it would not even made sense in a programming language built around the idea of passing messages between processes. If you send a message, you are not side-effect free anymore.

Somewhat in the same sense, Clojure is stateless but not side effect free. Most clojure coding is performed without explicit state change and there are a few explicit ways to deal with state. This is done in transactional way, so that essentially state change is allowed if explicitly required. State change should not mingled with stuff which has effect outside the clojure process. You can "unset" a variable and recover the old value, but you cannot unsend a message.

At the moment I feel that I can easily tolerate (and like) the kind of state change insulation provided by clojure and the pragmatics of avoiding side-effected stuff for as long as possible typical of all functional languages. On the other hand, I find really hard to accept absolute stateless-ness.

Lisp and Languages with FOFs

On the other hand of the spectrum of "functional languages" we have Common Lisp and the various freedom languages where functions are first order objects (Python, Ruby, ...), with Scheme leaning a bit more towards the functional side. Technically speaking Python is not even a functional language. Surely it is not stateless (but Lisp is neither). In fact, in Python recursion is frowned in favor of iteration (but iteration is often built around generators, which are an user-friendly/dumbed-down[0] instance of anamorphism -- unfold --, which is strongly functionally flavored). On the other hand, some call Ruby the Matz-Lisp.
In these languages there is no explicit restriction towards stateful programming. This means that usual algorithms and way of thinking are easy to implement (especially in Lisp, both very functional and very stateful algorithms are easy to implement). On the other hand, restriction of state change is left to the sense of the programmer (which not always is a good thing, but I don't really like languages which tell me exactly what to do... they just have to make the better way obvious -- I'm not Dutch, sorry --).
To be sincere, there are just things that are so much easier in stateful programming than in stateless programming. And for other stuff the converse applies. Consequently, the language should not force me too much towards one style or the other. Moreover, stateless-only languages tend to be harder to pick up for beginners (on the other hand Python and Ruby are extremely easy to start with and I believe that scheme is a wonderful first language for computer scientists). Eventually, stuff like GUIs are really hard to implement in fully stateless settings (clojure essentially relies on Java underlying classes, and I actually have never used Erlang for guis -- not that I am a strong GUI developer --).

On performance

One of the consequences is that lots of algorithms, data structures and ideas are just developed with stateful programming in mind. This is the reason books suck as Okasaki's Purely Functional Data Structures have been written: just to bridge the gap between mainstream stateful programming and functional reasoning.

Unfortunately many algorithms have been created with imperative languages in mind, and the data structures supporting the algorithms strongly reflect the imperative setting. Implementing such algorithms in functional settings is usually inefficient to say the least. Much theory (and not only the practice) is built around state change. The Art of Computer Programming uses a pseudo assembly language which essentially manipulates a Von Neumann machine.

Something as easy as "modify an array" may be a source of inefficiency in many languages. E.g., clojure provides transients just to make it easier to express such operations. On the other hand functional languages have usually some troubles in providing a data structure that is immutable, offers O(1) random element access and allows element modifications without copying the whole data structure.

Books


"Purely Functional Data Structures" (Chris Okasaki)


"Art of Computer Programming, The, Volumes 1-3 Boxed Set (3rd Edition) (Vol 1-3)" (Donald E. Knuth)



Notes

[0] depends on whether you are a python or a scheme programmer: python programmers usually say user-friendly

2 comments:

Joost said...


Something as easy as "modify an array" may be a source of inefficiency in many languages. E.g., clojure provides transients just to make it easier to express such operations.

Just a nick pick, but clojure's transients cannot reliably be used as mutable data structures. Correct algorithms using transients look like restricted versions of their pure functional counterparts.

As the link above sais: "[...] transients are not designed to be bashed in-place. You must capture and use the return value in the next call."

Clojure does provide mutable java-style arrays if you really need them, though.

Unknown said...

I don't believe it's nickpicking. My sentence is reads differently from what I actually meant and you're right in pointing that out.

I don't want to change a structure *in place* (at least, this is not necessary for many algorithms).

For example, consider Dijkstra algorithm. I have an N node graph and I have an NxN matrix. I'm perfectly confortable with functions which return the modified data (instead of just modifying the thing in place), as long as such operation does not invovle having too many copies of the NxN matrix.

If set!: M x index x value -> M
instead of set!: M x index x value -> void

I really don't care (I prefer that, in fact). What matters is that the thing is fast.

Unfortunately, from my post it could be inferred that transients were something whose interface was of the second kind.