RiK0 Tech Temple: 2011

Friday, December 23, 2011

Does Clojure fix "Fundamental problems with the Common Lisp language (citation)"?

I was looking for some Common Lisp libraries to implement an idea of mine. For lots of reasons I was considering not using Python or Clojure and going directly with Common Lisp (in fact, I think it is going to be Scheme... but it is hard to tell).

As it often happens when following semi-random links on Google, I stumbled on something quite interesting:
Fundamental problems with the Common Lisp language

Nothing extremely new, indeed. The only point that really surprised me was the "hard to compile efficiently" thing. I would have said that in general SBCL is a pretty fast environment. Not as fast as C++, perhaps not even as Java (but I believe that this depends from the specific benchmarks used), but still fast.

However I was mostly interested in the other claimed problems:

Too many concepts; irregular
Hard to compile efficiently
Too much memory
Unimportant features that are hard to implement
Not portable
Archaic naming
Not uniformly object-oriented

May seem like a lame argument... but I think that clojure actually addresses all the problems but number 2 and 3. Please notice that I'm not claiming that clojure is a memory hog or that it is slow. Simply put, right now my impression is that common lisp is still faster than clojure. I believe that this is due to Clojure being an additional layer over Java. Java is itself probably marginally faster than SBCL (though your mileage may vary). With marginally faster I mean that really depends on what you are doing and one or the other may result faster.

Alioth benchmarks are, like all benchmarks, not extremely relevant. Still, this is my general impression. SBCL does a wonderful job, in that CL is much higher level than Java and there are many more engineers optimizing the JVM. However, Clojure takes its toll, in that, according to my tests (and to alioth as well) there is quite a lot of work to do to make it catch up with Java.

Probably for high level stuff or specific problems (where in Java there would be essentially some part of clojure runtime/libraries to be re-implemented) they are on par.

About the memory, my feeling is that JVM is a rather memory intensive business, and Clojure can't do much about it. E.g., Python programs usually run using much less memory when confronted with similar data-sets.

I'm not saying that we should drop CL and switch to clojure. However, I believe that clojure addressed some of the problem that many (some?) in the CL community feel CL has.

Thursday, December 15, 2011

Better toys for us programmers

Today PyCharm 2.0 is out. A couple of days ago, the last version of IntelliJ (11) was released as well.

I'm somewhat reluctant to discuss the matter here; I'm not a free software integralist by any means (like for example posting from a beautiful MacBook Air, with OS X), still when discussing commercial software I feel a vague sense of guilt because I feel like I am doing advertisement. This is especially true as most of the times there it does not involve only reporting facts (which would be acceptable, as the truth tends to be true, even if in favor of a commercial entity) but impressions, which could make me seem biased.

These are the most interesting improvements I found in PyCharm/IntelliJ (it should be clear when stuff applies only to one of the two -- and amazingly enough, the what's new page on PyCharm is more detailed):

Support for pypy (this is going to be immensely important for me, as I'm planning to move part of my development environment on pypy)
Support for ipython (more on this later)
Cython support (which may be something I'll be using soon enough)
Git graphs (been a bit of a PITA lately to remember the proper log options to have them in the console, my memory ain't what it used to be)
Gist support (I love that)

There is more. I did not even realize that before it was not there... but for example now PyCharm completes the keyword arguments in Python stuff. Which is extremely nice, in my opinion. And also the refactorings seem to work more accurately.

Eventually, I want to point out a feature I was sorely missing in all environments I tried, i.e., choosing the method to step into when debugging. First, I'm not the kind of guy that spends lots of time in the debugger (see., that would be a clear smell on the quality of my unit tests). But when I do, I often feel rather boring having to walk step by step irrelevant code.

Consider this:

my_object.that_is_some_method(
    ILuvDIP(...), self.that_is_interesting(), self.may_be_a_property)

and suppose that I feel the bug is in that_is_interesting. Now, it may well be that my Python debugging skills are not excellent. Afterall only recently I stopped instrumenting my code with prints and always use a proper debugger. Before that I relied almost only on unittests and prints.

Before PyCharm 2, it was hard to step into that_is_interesting and not in ILuvDIP. I believe that the same logic also applies to Java and IntelliJ and hopefully to Clojure. Then, back to us.

I really ain't lots of problems with IntelliJ. I feel that an IDE is a very valuable asset when developing Java. In fact, I would say it is a PITA to do without. Perhaps with Emacs and some modules like JDEE. Still, I don't know... I try to avoid Emacs these days (as I'm finding vi more and more natural to me).

The question is more interesting regarding Clojure and Python (and Ruby...). Probably if I would use Django a lot, PyCharm would be a clear winner. Support is awesome and you have to work in a frameworkish way in any case. There is nothing wrong with that, of course.

These days I'm mostly writing library/algorithmic code in Python. And I feel like ipython+vim is a great tool here. It's got an almost Mathematica/Matlab vibe that is nice for what I'm doing. I also try using that approach with Clojure more often than not. Tests as documentation and specification, REPL as an integrated development environment. It is possible that with ipython builtin in PyCharm I could just move that workflow to PyCharm itself.

There is however, the issue of code complete. Emacs fares pretty well to complete Clojure, but as far as I remember is not so good regarding completing Java (perhaps I should have installed JDEE). As a consequence, I used IntelliJ a lot even with Clojure.

Recently, I started exploring Clojure+vim too. And it is a wonderful world. I have most of what I need and it is extremely lightweight. I have to investigate the issues further. However, IntelliJ remains a solid environment for Clojure development.

Now the essential question is... I quite need IntelliJ for Java. And having it working with Clojure is a big plus (even if maybe not a strict necessity). But should I buy a separate PyCharm or just rely on IntelliJ plugin?

Monday, December 12, 2011

Erlang and OTP in Action (review)

First time I got into Erlang, it was with "Programming Erlang: Software for a Concurrent World" (Joe Armstrong). It is a very nice book, in my opinion, and I enjoyed immensely reading it. That was like 4 years ago or something. Back then, the functional revolution was just at the beginning: no widespread Scala, no Clojure at all, essentially no F#. Back then it looked like OO was going to rule the industry for years and years, with no contender of sort. Rails was fresh, Django was fresher (back then the APress book was just being released).

I bought the book because I wanted to see this "brand new" technology (20 year old, but just going to make it through in the circles I did frequent). And really, the language looked like 20 years old. Full of Prolog legacy, Unicode who's that guy? and so on. However, it no other piece of software I knew could as easily. Massive concurrency, hot swapping code. Wow.

The language, I did not especially like. The runtime… WOW! As a language, I love Python or Clojure because the way apparently distant functionalities work together and create something even more beautiful. Erlang does not have that at language level. It has at a framework level.

Think about hot swapping code in an object oriented software. First, I somewhat believe that object oriented modules are somewhat more tangled that functional equivalents. Partly because of the object reuse OO promotes (that can really be against you if you want to swap code). Then, there is the whole problem of references vs. addresses. Addresses have an additional level of indirection that makes it far easier to swap a process than it is to swap an object.

But the very idea that state is in the function parameters and that process linking and easy restarting thing are at the very root of how easy it is to swap code in Erlang. But then… back to the books.

The essential problem was that after seeing a bit of Erlang, I thought OTP was not a big deal. Yes, it is easier to use. But also plain Erlang is. And I convinced myself that should I need Erlang, I could just use plain Erlang. Than things changed: I read some OTP using code and I understood not so much of it. That was this year. What I understood, was that it could be helpful. I had to write a concurrent prototype and I welcomed the idea not to write as much code as possible.

In the meantime, I forgot many things about Erlang. In this situations, instead of reading the same book twice, I buy another book to gain perspective. So I decided to buy another book. One of the candidates was "ERLANG Programming" (Francesco Cesarini, Simon Thompson). It also had excellent reviews. Essentially I believe it contains more material than Armstrong's book and is also a bit more recent. However, as far as I understand, is still a bit terse on OTP.

As a consequence, I bought "Erlang and OTP in Action" (Martin Logan, Eric Merritt, Richard Carlsson) instead. And I'm very happy of this choice. It complements Armstrong's book well and extensively covers OTP. In fact, I also believe that the approach is very interesting. Introduce OTP first and learn to use it, then when you know what it can do, you are going just to use that. Then, learn plain Erlang in order to extend OTP when your use case is not covered. And a nice plus was a detailed description of JInterface, which I could need as well.

In fact, I do think that as it may make sense to introduce objects as early as possible in a book on an object oriented language, starting with OTP is a very big plus from a learning perspective. Then perhaps the point is that I did not need to get into a functional mindset (which I think Armstrong book does with more attention.

If the question is however just "learn to think functionally" I believe that "The Joy of Clojure: Thinking the Clojure Way" (Michael Fogus, Chris Houser), LYHFGG or Land of Lisp are probably better alternatives. Another interesting one is "Functional Programming for Java Developers: Tools for Better Concurrency, Abstraction, and Agility" (Dean Wampler), even if it has a whole different perspective.

Wednesday, December 7, 2011

Good and not so good reasons to learn Java (or any other language) [part 2.5]

So this is the third part after the not so good reasons to learn Java and the other good reasons to do it.

Java could be fun

Really. There is some stuff that is really nicely done. I like Akka, for example. And I find the idea of hacking with assertion really funny. It is a totally different approach to meta-programming that I really liked. I also like Antlr... after that every other library for imperative/oo-languages to build parsers seemed primitive.

There is some nice stuff in the Python world as well (and yeah, in Lisp you have Lisp and Haskell has wonderful stuff to). But I can tell: implement a language in Java and in C++. In Java you will be finished so much earlier... of course, other languages are faster too. But your team may not include other Haskell hackers.

Libraries, libraries, libraries

In Java there is a library for everything. Quite often, they are very well done, even if somewhat over-engineered. Probably my idea of over-engineering is a bit extreme (it comes from having seen lots of lean languages). But really, they are robust and well tested.

Moreover, Java is usually efficient enough to be a decent contendant for more demanding tasks.. Maybe C/C++ can be avoided for your application (and Java libraries are usually easier to use than C++ ones).

Java as an intermediate level platform

Say you are interested in language design. As far as I can see, you have few choices.

Write your own runtime, vm, etc. That was the "old" approach (Python, Ruby)
Implement your language on the top of a Lisp[0] or Prolog[1] interpreter
Use LLVM
Use JVM
Use CLR/Mono

I would rule out the latter, because I don't do any windows. Between LLVM and JVM as far as I understand it may depend on the language. JVM has a few quirks (who said lack of TCO?), but has plenty of available documentation, real world examples (Scala & Clojure), a large number of developers working for the well-being of the plarform, and a huge amount of libraries, libraries, libraries you may want to use.

LLVM is probably going to be faster, though. And has lots of optimizations and stuff for static languages. In any case, to make an informed choice, you need to know Java

Tools (IDEs, Maven, Ant)

IDEs are the bless and the curse of Java. After having used for sometime IntelliJ I really feel that the amount of functionality that IDEs for other languages offer is puny. The possible exception is Emacs for Lisps.

The funny thing is that I'm not an IDE gui. However, really, when you do refactoring (even simple stuff like moving functions and files around) it is an invaluable time-saver. I miss vim as a text manipulating programming language, but still... for Java IDEs are almost necessary. And not only bridge part of the gap with other languages, they have lots of useful stuff.

Regarding Maven... well, I just like it. I also like the fact that I'm able to build IntelliJ projects from Maven scripts is wonderful. And also Ant is a very good tool (although rather over-engineered). My humble opinion is that Ant is far easier to use than the whole autotools company. This may also have something to do with the fact that the whole Java deploy process is easier.

In any case, the point here is not how cool is Maven. Developers from other platform may want to know what is boiling up in Java-land. Sometimes we have sub-par tools and we do not even know it. Of course quite often Java tools fix "Java problems" that are different from the ones we have in other languages. Sometimes not.

E.g., the design of Maven could inspire similar tools for the other platform. Both for its strengths and for its weaknesses (to avoid them, of course).

Learn "classical" threads

Ah, this is weak. However, other languages have a "threading" module that is heavily inspired by the way Java does threading. I don't particularly like it, but being familiar with it may be a very good idea. As far as I can tell, is also what most people have in mind when thinking about threading. Who am I to say that they are wrong? I can just tell them there is better stuff.

And yes, pthreads are even more a PITA.

Find Java in other languages. Sometimes.

This is basically the extended version of the older argument. Java is hugely popular. Many "new" things are created in Javaland, and then are ported to other platforms. Or perhaps are not created but became first popular inside the Java community.

For example, xUnit libraries are available everywhere, but I think that most people first started working with it in Java. Some of the authors of the original Smalltalk library actively work on JUnit, etc etc etc.

Conclusion

I do not think that Java is particularly good. Not particularly bad, either. It is not the kind of enlightening language Scheme is. It is not easy to use and predictable like Python. However, its popularity may make it unwise not to learn it. Especially for communication reasons:

Books
Other developers (talking about OOP, libraries)
New ideas brewed in Javaland

wow... that's it. ;)

---
[0] Scheme, Clojure, Common Lisp...
[1] Erlang was created this way, even if now it has its own (wonderful) runtime

Saturday, December 3, 2011

Good and not so good reasons to learn Java (or any other language)[part 2]

So this is basically the second part of "Good and not so good reasons to learn Java".

And here I will discuss the good reasons to learn Java. Some of them...
I decided to split this in two posts. The next one will be published in a few days.

If you know me a bit, you probably know that I do not like Java particularly. In fact I found out that the essential problem is with Java being presented as something modern or "superior". It is not. However, it did something very good in the software environment (a part from electing over-engineering as a form of art): many technology that were outside the mainstream and was dubbed slow (garbage collector) are now widely accepted and that is partly because of Java. This is no reason to learn Java, though.

Moreover, my opinion is that Java is not a good first language. It is not a good second language either. However, after learning a high level language like Python or Ruby, a couple of functional languages (maybe Clojure and Haskell or Scheme and Scala) and something lower level, such as C, well... learning Java it could be a very good idea.

Lots of interesting stuff runs on the JVM

Yes... I'm talking about Clojure and Scala. And perhaps JRuby too. Sometimes understanding the underlying platform helps understanding some design choices. Moreover, there is stuff (e.g., in Clojure) that has not yet a fully clojurized variant and we have to use Java stuff.

Knowing Java is a plus, of course.

Lots of interesting books deal with Java

I'm not talking here about the wonderful Effective Java or Java Puzzles. They are great Java books, of course. And some suggestions may also apply to other languages. Still, if you are not interested into Java, you may miss them.

No, I'm talking about other great books...

Clean Code and Agile Software Development
xUnit Test Patterns
Analysis Patterns
Refactoring
Patterns of Enterprise Application Architecture
Head First Design Patterns
Functional Programming for Java Developers
...
Here the list could be really long...

Some of them are not entirely in Java and some have non-Java alternatives. Still reading Java is a huge plus in reading those books. I believe part of the reason is that:

Java is like a common-denominator object oriented language

Forget the Java platform. Consider just the language. Essentially Java is really a common-denominator of other object oriented language. Other languages typically offer more. Are more dynamic, offer more powerful type-systems, more expressive constructs.

Just thing about the OOP building blocks. Here we are talking about the mis-interpreted OOP as a matter of types vs. a matter of messages. That is what most people think about when referring to OOP. Unfortunately, as I said.

Java has those. But does not have much more. If you want to design OO stuff, Java gives you the building block, but does not stand above offering higher level construct. Think about Python or Ruby or Lisp... their OO facilities are so superior (plus they typically offer stuff outside the OOP model -- well, Lisp is so much more than an OO language...) that you are probably going to design stuff differently. Probably you are not going to over-engineer stuff.

Modifications are cheap. Abstractions are easy. And of course you can go outside the OO model when it makes sense.

On the other hand in Java you cannot. You can just get back at a procedural level, which is clearly inferior. The best thing you can do in Java is try to be as OO as possible: other choices usually do not pay.

As a consequence:

Java is a great language to expose those "ill-conceived OOP" concepts that are used in every other language where ill-conceived OOP techniques are used.
Java is a great language to truly learn to program like an object-oriented zealot

Please notice that I consider the latter a very good thing. It is a very good exercise that lets you understand the merits and the drawbacks of the paradigm. Probably when you think that something really sucks in Java, you hit a limit of OOP in a static language. Other languages may make that easier, but the wall is still there.

Java is good to learn "good OOP" too

Yes! As I already said there is a lot of smart people working with Java. They have already discovered and taught how to do "good" OOP in Java. You should learn it too.

Then, when using a different language things could only be easier. Some of the techniques you learned may be useless, because the language offers better abstractions. Still you will have a pretty clear understanding of what such abstractions are doing and perhaps you may develop a feeling for when not to use them.

As a matter of fact, too much magic is bad, even if it may seem cool. While sticking too much to simple things may actually complicate the design in the long run (meta-programming leads to code that you do not write and that is the only kind of code that needs no debugging or testing or maintenance, as it does not exist), too much magic has the same effect at the opposite position of the spectrum.

Avoid writing Java in other languages

If you know Java, at a certain point you may discover you would be writing much the same code if you were using Java. If the language you are using is C++, you probably have done something wise: chosen a restricted subset of C++ and used that one (than we may argue if you actually left out good stuff).

However, if you are using Python or Ruby, or, worse, Clojure, then you are writing awful code. $x code is not meant to be structured like Java. If it does, you are not using the language well. This is basically a side effect of "Java is like a common-denominator object oriented language".

Saturday, November 26, 2011

Good and not so good reasons to learn Java (or any other language)

The first thing to consider is there is really not such a thing like a reason not to learn a programming language. Or better, there is only one reason: lack of time. Learning a language is a tricky business[0]. Learning to use a whole platform is way trickier.

Please notice that some of the reasons presented here are general enough to be applied to any programming language (especially the bad reasons to learn a language). Others are specific of Java (especially the reasons to learn it).

Also keep in mind that the good reasons to learn Java will be presented in another post because this one is becoming far to long to be readable in the few minutes I feel entitled to ask you, dear reader. So believe me... you probably should learn Java. Still, not for the reasons I list here.

Not so good reasons to learn Java

It is widespread

You may be lead to think that since Java is a very widespread language, it is easier to find jobs if you know it. In the case of Java there is the question that I am convinced that it is not a bad thing to know Java and that can have pleasant effect on finding jobs (more on that later), still I would not consider it a good reason to learn Java.

Learning a widespread language means more jobs and more applicants. What really matters is the ratio between jobs demand and offer. And usually for very widespread languages it is not always very high. Skilled, experienced and "famous" developers do not care much. The others should.

It is object oriented

This and the variant "it is more object oriented" are clearly wrong reasons. There is plenty of good object oriented languages. And it is rather hard to decide which is more object oriented (lacking function looks more like a wrong design decision than a clue of being more object oriented).

Besides, I'm not even sure that being more object oriented is good (or bad for what matters). Well... not sure that being object oriented is inherently good either. Maybe in ten years the dominant paradigm will be another. Maybe in two. Three years ago "functional programming" was murmured in closed circles and outsiders trembled in fear at its very sound.

No, joking. Most people thought that "functional" meant having functions (and possibly no objects) and thought about C or Pascal. They did not know it was about not having crappy functions. Yeah, more than that, of course, but that's a start.

It is a good functional language

Just checking if you were reading carefully...
That is a bad reason because it is not true!

It is fast/slow

Oh, come on! Java implementations are decently fast, faster than most other language implementations out there and usually slower than implementations of languages such as C, C++, Fortran. Still, for some specific things it may be nearly as fast or faster. Sometimes it is not only CPU efficiency that matters.

It is good for concurrency

The "classic" Java threading model sucks. It's old. Nowadays other ways of doing concurrency are better (more efficient, easier to use).

Such more efficient easier to use methods are built-in in languages such as Clojure (or Scala, or Erlang). Still, Java is probably the most supported platform in the world. Such concurrency models are not inside the language, but you may have them using libraries.

Sometimes this is a real PITA (language support is a great thing, because the syntax is closer to the semantics).

Moreover having the "older" concurrency model visible may be a problem with the newer stuff. And some other libraries and frameworks may just assume you want to do things the old way.

It is good for the web

Ok... most people do not even know how widespread is Java on the web. In fact, it is. But is it really good? I do not really think so. There is lots of good stuff in Java world for the web, of course. The point is that there is also for other platforms. Here Rails and Django spring to mind.

Moreover, there is loads of extremely cool stuff coming from the functional world. Erlang and Haskell have some terrific platforms. Scala and Clojure also have some extremely interesting stuff and can leverage mature Java stuff but make it really easier to use.

Grails (web framwork) may seem on the same wavelength, still I think there is a very important difference. First, I don't like Groovy at all. In fact Groovy very author says that he would not have created it if he knew Scala. And of course since I do not like Groovy, I do not see why I should be using Grails, which, as far as I know, does not offer killer features over Rails or Django.

Scala and Clojure are different. They are not just "easier" than Java. They teach a different way of thinking and approaching development. And of couse, they are great for the web also.

I already know it a bit

This is quite an interesting point. Why, if you know a language a little, shouldn't you learn it well? This essentially makes it clear the difference between a not so good reason to learn something and a reason not to learn it.

The point is simple: if you are interested in learning Java (for the good reasons), do it. But from knowing a language "a bit" and knowing it "well" there is quite the same distance that from not knowing it and knowing it well. So don't do it because you think your prior experience may be extremely relevant.

There are languages which are just easier to master (where easier means that to learn them from scratch it takes less time that to become as proficient in Java as you would learning those languages -- yeah, Python, Ruby, [1]...)

Besides, I feel that the second step in Java is rather hard. I think that Java is very easy to learn for experienced developers (it is not a joke). The very first step in Java is relatively easy (a part from lots of useless crap like having to declare a class to put there a static main method and overly complicated streams). The step just after that, the one that takes you from writing elementary useless programs to writing elementary useful programs is quite harder, since lots of useful stuff in Java requires to understand OO principles quite well to be used without too many surprises.

So, you may have learnt Java at the college. Still, if you to hack something and "grow" there are languages that let you do it faster (Python or Ruby).

I have to

This is the worse possible reason because I don't see the spark in it. You have to. You probably are an experienced developers that got bored to death reading this post, you did not know Java and you have to learn it. Maybe your boss wants you to do some Java.

I'm sorry for you. Java is not easy to learn (especially to learn it at the point you can work with it). Mostly because everybody has been doing Java in the last fifteen years. Smart people and dumb people.

As a consequence there are truly spectacular pieces of software it is a pleasure to work with and worthless pieces of over-engineered crap. I truly hope that you are going to work with the great stuff.

Still I consider a "bad reason" to learn a language, because you are probably not going to enjoy the learning process. If you were, you would have used a different sentece... like "I want to".
And perhaps the thing would have been "My boss gives me the opportunity to increase my professional assets learning this new technology, Java".

It was easier to download/find a book/my cousin knows it.

One of the most popular reasons people pick up a language is because they find it installed on their system. That is not usually the case for Java, in the sense that usually the JDK is not installed, even if the VM is. Variants of this argument are a friend proficient in the language or, more frequently, availability of books at the local bookstores. This kind of arguments usually apply for complete programming beginners or people who had prior experience that has not been refreshed for years (e.g., old Basic programmers).

It is true that it is easy to find manuals for Java. The point is that not every manual is a good starting point. Specifically, this kind of user really needs a manual targeted at a true beginner. Unfortunately, that is the category of books where more craps piles. First beginners usually do not really understand if the manual they are using is crap.

If their learning process is too slow (supposed that they see it) they usually blame: (i) programming in general, (ii) the language or, worse (iii) themselves. Well, the language may have its faults (especially in the case of Java, still C++ is worse for a beginner), but it is important to understand that the culprit is usually the manual.

The funny thing is that the people who know how to tell apart a good manual from a bad one are usually the ones that do not really need an introductory book, are probably not going to buy it and more often than not are not enquired about a good manual. So unless their cousin is actually a skilled programmer, beginners are really risking buying a bad manual. And please, notice that even skilled programmers are susceptible to a similar argument: if you are interested in a new language and you read a terrible manual you may build a wrong opinion on the language (and perhaps you are not interested in spending another 40 gold coins to convince you that the language is really bad).

So please: read reviews on Amazon (or similar site -- notice that I'm not going to suggest to buy from Amazon, even if I often do and I am an associate -- here I'm just saying that the amount of reviews on books on Amazon -- com -- is usually large enough to make an idea). Find people that are experts and have teaching experience: their reviews are probably the one to base the judgment upon. Then buy the book from whatever source you want.

So do not buy a Java book because you can find books on Java and not on Clojure[2]/Python/Ruby/Whatever[3]. Choose the language (this is hard), choose the book (this is easier), buy, read it, study it, code code code.

---

I kind of found out that for people really convinced that learning a language is easy one or more of the following apply:
1. have an unmotivatedly high opinion of their knowledge of the languages they claim to have learned
2. have a very inclusive definition of "learning" a language (e.g., writing hello world in it)
3. only know and learn very similar languages (really, if you know C++ it's gonna be relatively easy to pick up Java, if you know Scheme probably Clojure is not going to be a big deal, etc.)
4. and tend to write "blurb" in every language they know (so for example they write Python as if it was C++ -- and usually are very surprised when things go badly)
Of course there is also a lucky minority for which learning languages is very easy.
It is funny that I am suggesting to learn only "object oriented dynamic" languages such as Python or Ruby. I understand that many advocates of functional programming may disagree. But I somewhat think that while some kind of people greatly benefit from learning a functional language as the first language they truly master, many others are just not fit for functional programming because it is too abstract and working imperatively may be easier for them. As a consequence, languages such as Python or Ruby may naturally lead you towards functional programming if it is the kind of stuff you like, but are still usable if you are not that kind of person. I have seen things...
And yes... if you are the kind of person that likes functional programming, you will get into functional programming sooner or later. This is just a bit later.
Notice Clojure here...
Whatever is not a language. Still, it would be a great name for a language. Maybe not.

Friday, November 25, 2011

Brief morning delusion…

So basically it's a couple of days I try to use a given piece of software. And for some unfathomable reason the thing does not do a specific thing which I need and it shall do. I'm not going to be more specific, because I do not want to point fingers.

The point is that it does not give any clues on why it is failing or what is exactly trying to do (thus "how" it is failing). Since the project is open source I decided to take the sources and hack my way to the problem. I have a rather good understanding on the domain that I'm going to solve (it's related to unix processes -- though in OS X environment --). I'm familiar with both.

It's been a while since I last wrote some Objective-C, but this time I should just look at the sources, perhaps set a couple of breakpoints and find out what is happening. My plan is that after that I could fix the source so that:

It logs what is doing
It logs any errors that occur
Perhaps I fix the specific problem in the code

The first bad piece of new is that I spot some obvious software engineering mistakes in the software. Unfortunately it is stuff that needs more than a casual hacking to be fixed (specifically, I'm talking about configuration stuff hardcoded in the sources). Still, it is not probably a big issue. It may even make sense in some contexts… well, not really. But anyway. Some wrong data-structures… but everything is basically fine: the code, a part from that, is well written and rather clear.

So I localize the piece of code that fails. It does a bloody lot of magic, in my opinion. My guts tell me that some of it is just unnecessary, some better design could lead to much simpler and less magic code. The only problem is that sometimes such kludges are just a consequence of unorthogonal design of the underlying systems (in this case OS X). But that is something I would do later on, after simply adding the code that logs possible errors (I think that is of paramount importance for the semi-technical users of the software, in the sense that they could better understand what is going awry when they customize it).

Then a thought strikes me: compile the whole thing just before making modifications. The svn repo should have been left in compilable state… but no. It is not. So I should have to find the last point where it can be compiled. Which I could do… well, another time.

Monday, November 7, 2011

Java 7 for Mac OS X

Just a preview, for now: link

Good. Performance improvements should also be seen in Clojure.

Still I am not extremely familiar with all the various improvements. After I found out that closures were not going to make it, I just lost interest in the whole process.

Saturday, November 5, 2011

Scheme Rant, but a bright future lies ahead!

Is this the best moment for scheme ever? First of all, I have to admit that although I quite studied the language in the last years, I'm a bit outside the spirit of the community. My impressions basically come from some discussions I have with more experienced schemer friends and reading stuff on the web.

I have often wrote about my difficulties with finding a satisfactory scheme environment. Essentially problems boil down to two things:

I have very high expectations for programming environment (batteries included, etc.)
I have very high expectations on Lisps, probably because every lisper I met spent a great deal of his non-programming time saying how lisp did this and that 30 years ago. So I expect to do it now, to do it fast and to do it well.

The issue with the first point is that I'm mostly used to programming environments which are mostly unique. Python is just Python. I use the very same interpreter everywhere, I know what it does, which features are supported and which ones are not. Same thing for Clojure or Erlang and nowadays mostly true for Haskell. I avoided the issue with C/C++ being rigorously adherent to the standards or to very minor gcc extensions (and by the way, I'm always using gcc).

On the other hand in Scheme there are many implementations and there is not a clear winner. There are implementations which are "worse" than others, but among the good ones it is hard to choose.

The problem is especially significant because I somewhat got into a period of transition between R5RS and R6RS (which are two standards). I somewhat lived something like that with C++, but at least back then there was clear consensus that pre-standard C++ was unarguably worse than post-standard C++.

Essentially, R6RS came out in the mid 2007 and I started learning Scheme not much afterwards. Moreover (here discussions with my friend kick in) a large part of the Scheme community did not like the standard, feeling that the new language was too large (plus strictly speaking it should lack a REPL).

After reading Dybvig's 4th edition Scheme book I found that mostly I like the new stuff. However, for some unfathomable reason I sticked with Gambit, which does not implement it. So, while I like the idea that Scheme was a little language which you use to build your language, I missed lots of R6RS features which just make to much sense for me (as an "application/library" programmer, instead than a "language programmer"). That, and the fact that if you want to toy with a new language, you do not want to re-implement merge-sort (unless you are are explicitly choosing that example to learn the language).

So there were lots of incomprehensions between me and the my scheme learning process. But I'm afraid that lots of non schemer may be feeling the very same stuff about it (and perhaps jump to clojure, which fixes that issues by default).

Some time later, the other scheme implementation I used became another language (and I'm looking forward to read the new No Starch Press book about it). So the more "batteries included" tool I knew in scheme, ceased to be scheme.

I think things would have been easier if I just picked up an R6RS implementation and stick with it. However, things went differently. Not that I do not like Racket, still I find ackward to code in Racket when I want to grok Scheme and code written in Racket is so often not very portable to other schemes (and that is why Racket is Racket). I'm not complaining about Racket: I think they did the right thing.

However, nowadays the Racket book is scheduled. And there is going to be a new standard R7RS. And they decided to define a "small" and a "big" language. And I really do believe this is wonderful because it would make clear exactly what to expect from either language also to newbies. Moreover, perhaps some of the more reasonable improvements with less reaching implication which R6RS will be added to the small language as well. E.g., how to define libraries and generic information about what to expect from the "platform". Or maybe "values" related stuff.

So, in essence, I think in the future it will be a great time to learn scheme. Maybe I will have the courage (and the time) to re-learn it from scratch and fix all my miscomprehensions which came from the way I originally learned the language.

Sunday, October 30, 2011

Does Perl suck? (and something about this Quorum Language)

The short story is that a group of researchers created yet another language. And I see nothing wrong with that. Most people sooner or later implement their own language (or wish to); researchers just do it more often. These languages usually have specific goals and some of them become widespread programming languages. Other remain academic toys. This particular language was created to be intuitive and easy to use. And the researcher meant to make it intuitive and easy with usability studies. I will comment on the idea later.

Fact is, that they put out a paper which compared their Quorum language with Perl and another language they designed to be bad. And they claim that the accuracy rates of their users were similar for Perl and Randomo (the bad language) and worse than Quorum.

Does Perl suck?

The question is that the story became very popular because of this: because they claim to have scientifically proved that Perl sucks. And such pieces of news have huge diffusion. The essence here is that I do not think that the results are so very relevant: they used only the easy part of Perl. As a consequence, I think that many other languages would have scored similarly. And think about it... if the easy part of Perl is hard for novices... what about the hard one?

As far as I can tell, the thing is mostly a syntactical issue. I haven't found papers on the full semantics of Quorum. From the examples in the paper, it looks like the illegitimate child of Ada and Ruby. I'm sure that I'm missing something, but it is mostly a matter of substituitions like:

for -> repeat [ which other major language had a repeat statement? ]
float -> number [*very* clear, especially before explaining the noobs how much different are floating point numbers from real numbers]

I don't consider it particularly readable. On the contrary to me something like:

  if d > e then
    return d
  end
  else then
  return e
  end

looks like a walking syntax error. I think I would take lot of time to convince me that it is proper english. I'm not a native speaker, but to me:

  if something is true, then do this, else do that

sounds quite more convincing than:

  if something is true, then do this, else then do that

All considered, it seems to me that they are optimizing the easy part of learning. The syntax is an issue in the first like 2-3 months of programming. After that the problems become semantic and theoretic. Computer Science is hard. It is not the syntax of the programming language you first learn to program with that makes it easier.

To me, it looks like they would use sponge rubber balls to make rugby less violent, completely ignoring the full contact issue.

Besides, I believe that most people who have serious issues with the syntax of todays' programming languages would also have problems with the semantics and with calculus and lambda calculus and whatever hard stuff is taught in the courses. Moreover, hiding the complexity is not always a good thing.

In a computer we can't represent a real number. We know that. At a given point in our curriculum, we also know exactly why. However, usually the odd behavior of floating point number strikes. Odd in the sense that people usually expect them to behave like real numbers. They do not.

A popular surprise is in Python when they use the console to make operations. In older Python versions, a number was printed just how it was in memory.

>>> 1.1
  1.1000000000000001

and guess what... this is extremely surprising for noobies (and in Python 2.7 the algorithm changed so that the printed number tries to be more 'intuitive' and in the case just prints 1.1). The fact is that 1.1 cannot be represented in memory (as a floating point number) and an approximation is used. This is quite hard stuff.

Pretending that floating point numbers are not floating point numbers is going to lead to disaster as soon as things get rough. So, I while I think there are far easier languages to learn to program with than Perl (Python, Ruby, Scheme) I don't think that Quorum is such a huge advantage past the "very very noob - why this does not compile if(foo); {...} else; {...}" part. And that is not the part to optimize.

And what about quorum?

Unfortunately enough, I do not know very much about usability. So I really cannot comment their methods. However, I think that I can say a couple of things from the point of view of language design. The first thing is that most languages that I think are worth learning and using have some very complex stuff and that stuff is what makes the language powerful. With powerful here I mean that it gets the job done quicker, with less line of code and, why not, shapes the programmer's mind in good ways, giving him insight.

Remove this advanced stuff and you have only dumb ugly languages. Such languages would not make you or your programs better. Remove first order functions and macros from Lisp or Clojure and you are just coding Pascal.

And I'm not exactly sure that "usability" can be applied to programming language design. Some argue that since most "experts" disagree on which features should be added to a programming language (and agree that having them all is a bad thing), such studies are needed to determine how to build the one and "true" language. And I have to say that I disagree.

Democracy plainly does not work in technical issues. Noob programmers are plainly not entitled to discuss about advanced language features that they have not yet the experience to understand and which solve problems they have not yet encountered. So called "professional" (i would say business) programmers do not care about the programming language, be it Java or Cobol. They probably would not have advanced features either, because they may thing such features would slow their juniors down. They mostly care about powerful libraries: the discussion is not at the language level, but at an architectural level. They don't do algorithms, they have to make complex under-documented systems interact and work correctly.

So we have just people who "love" programming languages. People who have a strong opinion on how a programming language should be. And guess what? We disagree. Some of us prefer Ruby to Python. Someone else thinks they are the same and that Haskell is the best.

The simple decision of creating an object oriented imperative language is a very strong assumption. I could not rationally say that object oriented is easier than functional programming. Not even if I thought it. And you can't ask noobs if they prefer functional languages (which they do not know what they are), nor you can ask business programmers (which probably dislike them). And if you ask to object zealots, they will answer that FP is just harder.... however, if you ask to functional programmers, they will say that FP languages are easier.

There are lots of design issues which are too bloody architectural to be decided with statistics.

Regarding the Quorum language itself... I strolled through the library files. And what? I changed my initial opinion... its more something between VisualBasic and Java. It has generics. Which are one of the things which are bloody harder to understand in Java (especially if you throw in extends, ? and company). And if you don't, you have a type-system that cannot express some perfectly useful stuff.

It has packages and includes (they are just called 'use'). The nice thing is that Java without an IDE is a PITA partly because of the redundancy of the interaction between classpath/packages and directory structure. And as far as I can tell that stuff is still there. Not sure that being strongly typed makes things easier for the noobs too.

Well, good luck then... my first impression is that it is basically Java without some nice stuff and a visual basic-ish syntax. On the other hand the compiler code looks very well written (still... writing compilers in Haskell/scheme is just so much easier than doing that in Java)... but really: in 2011, with everybody re-discovering the merits of functional programming (Clojure/Land of Lisp/Learn yourself Haskell/Racket/.../C# being more functional than ever/Erlang)... do we need yet another object oriented language?

Saturday, October 29, 2011

Working Java 7 on Ubuntu Oneiric Ocelot 11.10 (update-alternatives)

I still have not figured out why the default openjdk installation on Ubuntu 11.10 seems broken. With broken I mean that there are no entries in the alternatives database and no command in the path (which follows from not being the default alternative).

As a consequence, I did the same old mumbo jumbo with update-alternatives. As the openjdk-7 is in /usr/lib/jvm/java-7-openjdk-i386/, I just run the following commands:

% sudo update-alternatives --install /usr/bin/java java /usr/lib/jvm/java-7-openjdk-i386/bin/java 4 
% sudo update-alternatives --install /usr/bin/javac javac /usr/lib/jvm/java-7-openjdk-i386/bin/javac 4     t
% sudo update-alternatives --install /usr/bin/jar jar /usr/lib/jvm/java-7-openjdk-i386/bin/jar 4

I guess I may want to do the trick for more executable stuff. Moreover, I have no idea whether the browser plugin works or not. It has been ages since I actually used it and right at the moment I do not care. Hope this will be helpful.

After that, just call the appropriate update-alternatives --config ja(vac|va|r) stuff!

I almost forgot... I just installed the plain openjdk-7-jdk/jdr stuff from the official ubuntu repositories.

Besides... I have seen a package sun-java-{yadda-yadda}-bin which may have been the provider of the links in the first place (when I was using sun's JDK which was correctly packaged). Right at the moment I cannot see a similar package for OpenJDK. My Ubuntu went postal recently and consequently I cannot verify. :(

Friday, October 28, 2011

Ubuntu 11.10

Yesterday I decided to update my netbook to the last incarnation of the Ubuntu distribution. In the past days I read something about bad bugs in the kernel regarding power saving and similar stuff; however, nothing happened here (yet).

I was quite amazed to find a Linux 3.0 kernel. It is the single event that made me think how little I keep myself informed about Linux specific stuff. I still find it hard to study all the languages, technologies and theory which has to do with my work (which is also responsible for me posting very little, lately) and my generic interests (functional programming, for example). I simply dropped most stuff regarding system administration and even platforms. In fact, I haven't upgraded to Lion either.

So I don't know a thing about this Linux 3.0 kernel. On the other hand I remember I followed very closely the new features of the 2.2, 2.4 and 2.6 kernels. In fact, I have also some memories of stuff before that (I used mkLinux, though). I think this means I'm just growing older.

Anyway... I still don't like Unity. However, I'm one of those guys who basically open up a shell a fire some stuff from there (editor, interpreters and compilers). Or occasionally I open IntelliJ/PyCharm. So really, I'm not entitled to talk about that. I simply noticed that the colours moved from that tiring orange to shades of green which are just easier on my eyes. Nothing important. I just find it nice not to have the gnome menus, since I have a very very tiny screen. In this sense, my iPad screen feels just larger (even though it is not). I think it simply depends on the way applications are designed.

I just found out that for some reason I do not have Java properly installed, even if before the update I did have it. It felt akward to run clojure and get a "command not found: java" error; especially considering that up to 5 minutes before that I was debugging a Java project inside IntelliJ. Though I think it has something to do with my current installation of OpenJDK 7.

About the "good" things... Now leiningen is packaged (and decently recen, as far as I can tell); moreover, also clojure 1.2 (and contrib) are installable packages. Perhapsit could be enough... Waiting for Clojure 1.3, by the way.

Thursday, October 6, 2011

Dahl (Node.Js), Sussman (Scheme) and the Scientific Method

Everything started with Dahl's post "I hate almost all software" (probably not Node.js). I basically agree with the second sentence and the last one (that is to say "The only thing that matters in software is the experience of the user" and "It's unnecessary and complicated at almost every layer"). Other stuff is just a list of obvious thing: when systems evolve and nobody cuts broken parts out, then systems become complicated.

In fact, this is almost a built-in feature: for years our answer to complexity was "add another layer". Which led to the "abstraction leaks" Martelli talked about in some conferences and the obvious fact that the layer we just hid did not became any simpler. We just put the dirt under the carpet. Needles to say, the new layer becomes as complex as the old one in a few years/months/days.

So what? I would also point out another apparently unrelated event. In 2009, Sussman explained why at MIT Python substituted Scheme in basic programming courses. And we all know how wonderful is SICP; still the authors themselves started thinkng about changing the course since 1995. Essentially the reason is that "engineering in 1980 was not what it was in the mid-90s or in 2000. In 1980, good programmers spent a lot of time thinking, and then produced spare code that they thought should work. Code ran close to the metal, even Scheme — it was understandable all the way down. [...] 6.001 had been conceived to teach engineers how to take small parts that they understood entirely and use simple techniques to compose them into larger things that do what you want.

But programming now isn’t so much like that. Nowadays you muck around with incomprehensible or nonexistent man pages for software you don’t know who wrote. You have to do basic science on your libraries to see how they work, trying out different inputs and seeing how the code reacts. This is a fundamentally different job, and it needed a different course."

Isn't that they are basically saying the very same thing about today software world? I think so. Software is so complicated that we ought to use experimental techniques to dig into it.

The part that Dahl lives out is that Node.js is not any simpler (I will elaborate on that in another moment). In fact Javascript itself suffers all the problems described of being unnecessarily complex (to me lack of orthogonality and regularity is a form of complexity of the worse kind -- and it is also the very critique I move to languages I love, like Clojure). Adding a nice asynchronous layer on the top of a messy language is only going to create an asynchronous messy language. Nothing more, nothing less. But hey, V8 is fast.

Tuesday, September 20, 2011

Eclipse...

Is it possible that to upgrade a major version of Eclipse there is no other way than just reinstall every bloody plugin I'm using?

Tuesday, September 6, 2011

Clojure: Quicksort in Continuation Passing Style and Trampolines

Introduction

In this post we are going to write a completely recursive stack-wary version of the classical quicksort algorithm in continuation passing style, making use of trampolines.
I already discussed quicksort in continuation passing style in this post. The idea is to refer to the post as much as possible and introduce here more information.
From Wikipedia:

Instead of "returning" values as in the more familiar direct style, a function written in continuation-passing style takes an explicit "continuation" argument, i.e. a function which is meant to receive the result of the computation performed within the original function. Similarly, when a subroutine is invoked within a CPS function, the calling function is required to supply a procedure to be invoked with the subroutine's "return" value. Expressing code in this form makes a number of things explicit which are implicit in direct style. These include: procedure returns, which become apparent as calls to a continuation; intermediate values, which are all given names; order of argument evaluation, which is made explicit; and tail calls, which simply is calling a procedure with the same continuation that was passed to the caller, unmodified.

Essentially there are two issues in CPS with Clojure. The first one is recursion on the stack. CPS is tail recursive by default; however, most of the times it is about mutually recursive calls. Which essentially means trampolines.
The second issue relates to the continuation themselves. First, they may occupy heap space (as the continuations stay in memory until evaluated), second, the continuations themselves need to be treated with trampolines, otherwise, when evaluated consume stack space.
However, with massive usage of trampolines, both issues can be solved.

Append

The first function we need to write is append (i.e., concat in Clojure Jargon). We need to write a new version because we need to have it deal with the continuation. And as we will see, append is a false friend.

(defn append-notramp [lst1 lst2 k]
  (cond
   (empty? lst1) (k lst2)
   :else (recur (rest lst1) lst2
                 (fn [rst]
                   (k (cons (first lst1) rst))))))

At first sight, his implementation looks correct. We can try it with simple inputs and convince ourselves that the semantics is right. Essentially if there are elements in the first list, the new continuation is a function taking a list rst and calling the continuation we received with a list where we consed the first element of lst1 on rst.
However, running with increasingly large values for lst1 results in a stack overflow.

user> (cps/append-notramp (range 100000) (range 4) identity)
; Evaluation aborted.

In this case, recur is a false friend. We are lead to think that since append-notramp is tail-recursive, we do not have to worry about stack overflow. The issue here are the chain of continuations, however. And they cannot use recur, because even though they *are* tail recursive, they are mutually recursive. Luckily enough, Clojure provides trampolines and in this case, they are also relatively easy to use.

(defn append [lst1 lst2 k]
  (cond
   (empty? lst1) #(k lst2)
   :else (recur (rest lst1) lst2
                 (fn [rst]
                   #(k (cons (first lst1) rst))))))

Here append plainly returns a chain of functions which returns functions and they can be resolved calling trampoline on them. I pass empty as the continuation because I do not want a list of some thousand of elements printed on standard output. The other function just checks the 0s are at the right places.

user> (cps/append (range 100000) (range 4) empty)
#[cps$append$fn__5392 cps$append$fn__5392@30ea3e3c]
user> (trampoline (cps/append (range 100000) (range 4) empty))
()
user> (trampoline (cps/append (range 100000) (range 4)
                              #(and (= (nth % 0) 0)
                                    (= (nth % 100001)))))
true

Partition

Partition suffers from a similar problem than append, in the sense that a naive version blows the stack.

(defn- partition-notramp [coll p? k]
  (loop [coll coll k k]
    (cond
     (empty? coll) (k () ())
     (p? (first coll)) (recur (rest coll)
                              (fn [p-true p-false]
                                (k (cons (first coll) p-true)
                                   p-false)))
     :else (recur (rest coll)
                  (fn [p-true p-false]
                    (k p-true
                       (cons (first coll)
                             p-false)))))))

The solution is also similar. When a continuation k is called, instead return a function which calls k, like #(k ...).

(defn- partition [coll p? k]
  (loop [coll coll k k]
    (cond
     (empty? coll) #(k () ())
     (p? (first coll)) (recur (rest coll)
                              (fn [p-true p-false]
                                #(k (cons (first coll) p-true)
                                   p-false)))
     :else (recur (rest coll)
                  (fn [p-true p-false]
                    (k p-true
                       (cons (first coll)
                             p-false)))))))

Quicksort

Now things become complicated. Quicksort is a doubly recursive function and is called inside the continuations. We basically have each continuation return a function which runs what would have been the continuation weren't we using trampolines. This works like a breeze: we just have to call trampoline before returning.

(defn quicksort
  ([coll less? k]
     (letfn [(qs [coll k]
                 (if (empty? coll) #(k ())
                     (let [[pivot & coll1] coll]
                       (partition coll1
                                  (fn [x] (less? x pivot))
                                  (fn [less-than greater-than]
                                    (fn []
                                      (qs greater-than
                                          (fn [sorted-gt]
                                            (fn []
                                              (qs less-than
                                                  (fn [sorted-lt]
                                                    (append sorted-lt (cons pivot sorted-gt) k))))))))))))]
       (trampoline (qs coll k))))
     ([coll less?]
        (quicksort coll less? identity)))

A word on performace

Of course, performance is rather terrible. First, quicksort in functional settings rather sucks. I find it so much imperative oriented that it is even hard to think about it in this context (notice here we are using rather complex stuff to make it functionally looking).
Then there are a number of issues:

Typical optimizations like switching to other algorithms with lower multiplicative constants is not easy to do. Every self-respecting implementation of quicksort when the list is sufficiently small switches to something like insertion sort because it is just faster (say... sublists of k elements).
Other optimizations like recursing first in the smaller half of the array are also hard to do (because we are using lists... we should essentially use a custom variant of partition which takes care of such nuisances).
Quicksort performs poorly if the pivot element is chosen as the first element in the list. However, in functional settings, where the core data structure is likely to (or just may be) a list, taking an element in the middle has an inacceptable algorithmic complexity
Moreover, partition is likely to use far more memory than what would use in imperative settings

Consequently, Clojure sort is plainly faster, considering that it should call Java sort (which now is the blazingly fast tim-sort, a difficult if not impossible to beat algorithm). Comparisons in this sense are embarassing. That is why I did not even try to code smarter in order to make this quicksort faster. And no, I'm definitely not going to turn tim-sort in CPS+trampolines by hand.
Technorati Tags: Programming, Continuation Passing Style, Functional Programming, Clojure, Scheme

Saturday, September 3, 2011

Too many cool languages (Python, Clojure, ...)

There is plenty of languages I like. I'm naming just a bunch of them here... so a non exclusive list may be:

Python
Common Lisp
Scheme
Clojure
Erlang
Haskell
Prolog
Ruby

The list is quite incomplete... for example I left out languages from the ML family. I do not have anything against them, I just think Haskell nowadays is way cooler (cooler != better) and also the Haskell platform is a definitive advantage. I did not mention languages such as Io, because although I'd really like to do something with it, it just does do not feel enough motivation to try and do it in Io instead than in one of the languages above.

Same thing for Javascript. It's not that I do not like it... it is just that if I can, I would rather use one of the languages I mentioned (which at least in case of client side JS most of the times is a no go). I also left out things like CoffeeScript and a whole lot of languages that did not even jump to my mind right now.

The list is quite representative of my tastes in programming languages. All the languages are very high level languages (I enjoy plain old C, but I tend to use it to extend the above languages and/or for problems which really need C points of strength). Most of them are declarative languages, almost every one of them has a strong functional flavor (basically just Prolog hasn't, the others are either functional languages or received a strong influence from functional programming).

Some of the languages are better for certain tasks, because they are built for them or have widely tested libraries/frameworks. And this somewhat makes it a difficult choice which one to chose. For example, every time a network service has to be written, I consider Python (Twisted/Stackless), Erlang and recently Clojure. If the system has to be highly concurrent, probably Erlang has an edge, but also Haskell (with STM and lightweight threads) and Scheme (Gambit/Termite) are strong contenders.

This is essentially about personal projects. In fact, for "serious" stuff, I am still using Python or Java (and C/C++ if more muscle is needed). This is basically because I have far more experience with those technologies. Clojure is probably a viable choice as well, as I can easily interact with Java and I have been using that quite intensively in the past two years (I also used it to extend one of the "serious" projects, though I had to remove it because I had other experimental components and it was tiresome to find out whether bugs were due to the other components or my relative inexperience with clojure). Besides, I have still to find winning debugging strategies for clojure: probably I'm missing something obvious here. I'm also trying to increase my expertise with Erlang.

These are some random thought about developing in the environments.

Editing Tools

The first issue, is to find the right tools. It is not exactly easy to discover the right tools to develop in non mainstream languages. For example, with Java any major IDE is a good enough tool. For Python, both PyCharm and WingIDE are excellent and easy to use tools. Moreover, vim is an excellent tool with only minor tweaking. Oddly enough, I'm still struggling to find a good setup for Emacs.

On the contrary, Emacs is quite easily one of the best environments for Scheme/Common Lisp/Clojure (thanks to slime), Haskell, Erlang and Prolog (thanks to the improved modes). Still, after years of switching back and forth from Emacs to vim and vice versa, I think I'm more comfortable with vim. Unfortunately, it is not always the case that customizing vim for these languages is as easy as it is with Emacs. For example, I found an easy to use Erlang plugin, but I have still to try using advanced plugins for Haskell and Clojure (I know the plugins, still I did not finish the setup). For Clojure it's just easier to fire LaClojure or Emacs+Slime. For Haskell, I'm not serious enough to know my own needs. I wrote software for my bachelor's in Prolog with vim, and I have to say that Emacs is quite superior (especially considering Sicstus custom mode).

Back in the day I used TextMate for Ruby. Right now I guess I'd probably go with RubyMine or vim.

Platform

Now this is a serious point. Many of my "difficulties" with Scheme lie here: chose one interpreter/compiler with libraries to do most things I want to do and some tool to ease the installation of other libraries (I should probably just stick with Racket). For example with Python it is easy: pick the default CPython interpreter, use pip/easy_install for additional libraties.

Common Lisp is also going well (SBCL + Quicklisp here). I was pleasantly surprised by Haskell: the Haskell platform is wonderful. Some years ago it was not completely clear which interpreter to use and external tools (like Haskell) were not completely immediate. Now with the Haskell platform you just have everything under control.

About clojure, I feel rather satisfied. For project oriented stuff there is leiningen, which works fine. My main objection is that most of the time I'm not *starting* with project oriented stuff... which is something which so much remembers me of the Java world, where first you start a project (as opposed to unix, where you just start with a file). Luckily enough in the clojure community there are also guys like me, and so there is cljr which just fills the area of "before" something becomes a project.

Cljr is very nice with emacs/swank/slime. Unfortunately I've not yet found a way to make it interact with vim.

Erlang already comes bundled with an interesting platform and rebar does lots of nice stuff too. Erlang/OTP tends to be a bit too project oriented but it quite makes sense. Basically all of the times you are setting up systems with many processes and all and it just makes sense to do it that way.

Prolog is essentially a complete mess. I advise just using SWI-Prolog and forget the rest even exists, unless you need some of the advanced stuff Sicstus offers. In this case you pay and enjoy.

Tuesday, August 30, 2011

Clojure: Unfold and Anamorphisms

I found rather unusual the apparent lack of an unfold function in Clojure. Perhaps I was not able to find it in the documentation (which I hope is the case), in this case, happy to have done a useful, although slightly dull, exercise. However, if this is not the case, here it is unfold in Clojure.

Unfold is a very powerful function which abstracts the idea of generators. I tried to make it lazy (please, feel free to break it) as we expect from clojure functions. Some examples are provided:

Here, we simply generate a list of elements from 1 to 11 (included):

(unfold #(= 0 %) identity dec 10)

The following example is more interesting: here I show that unfold is at least as powerful as iterate. It is possible to regard iterate as a simpler (and easier to use) variant of unfold.

(unfold
      (fn [x] false)
      identity
      inc
      0)

The code is equivalent to (iterate inc 0). Now the idea is that:

The first argument is a predicate that receives the seed (last argument). If it is true, the function returns the empty list; otherwise returns the cons where the first cell if the second argument applied to the seed and the second cell is a recursive call to unfold. All the arguments but the last one are the same. The last argument becomes g applied to seed.
The second argument is a unary function which is applied to each individual "value" which is generated. In this case we just use identity, because we want just the plain arguments. We could have used #(* %1 %1) to have the squares of the values or whatever makes sense.
The third argument is the function that generates the next seed given the current one
The last argument is just the seed

Unfold is extremely general. For example the classic function "tails" that returns all the non-empty sublists of a list ((1 2 3) -> ((1 2 3) (2 3) (3))) could be implemented as:

(defn unfold-tails [s]
  (unfold empty? identity rest s))

The standard map, becomes:

(defn unfold-map [f s]
  (unfold empty? #(f (first %)) rest s))

Now, here the implementation:

Upgrade

With this update, I present an upgraded version with the tail-g parameter. The classic srfi-1 Scheme version also has this argument. Originally, I thought that in Clojure I could use it to have different sequence types. On a second thought, it can't work because of the way Clojure manages sequences (at least, not in the sense that I intended it).

On a second thought, however, it is highly useful. Consider for example the tails function I defined above. In fact it returns all non-empty substrings of a given string. This makes sense in lots of contexts, however it is different from the definition of the Haskell function.

tails                   :: [a] -> [[a]]
tails xs                =  xs : case xs of
                                  []      -> []
                                  _ : xs' -> tails xs'

That could be roughly translated (losing laziness) as:

(defn tails [xs]
  (cons xs
        (when (seq xs)
          (tails (rest xs)))))

This function also returns the empty list. Surely, it would be possible to (cons () (unfold-tails xs)), but that would break the property that each (finite) element in the resulting sequence is longer than the following. Without the kludge, the unfold-tails function breaks the strictness property that the resulting list contains xs if xs is empty. Accordingly, tails should really be defined to include the empty list. Appending the empty list to the end of the list returned from unfold-tails would maintain both properties, but would be excessively inefficient.

On the other hand, specifying the tail-g function allows a cleaner definition

(defn tails [xs]
  (unfold empty? identity rest xs
          #(list %)))

Essentially, without tail-g it is not possible to include in the resulting sequence elements which depend from the element which makes the sequence stop.

Updates

Substituted (if ($1) (list) ($2)) with (when-not ($1) ($2)) after some tweets from @jneira and @hiredman_.
Added version with tail-g
Here more functions and experiments with unfold.
Removed HTML crap from cut and paste

Technorati Tags: Clojure, Anamorphism, Unfold, Functional Programming, Programming

Sunday, August 28, 2011

On monads and a poor old (dead) greek geek

After immensely enjoying the LYHFFP book, I immediately started playing with Haskell a bit. Since lately I have been implementing sieves in different languages, I found that the Sieve of Eratosthenes was an excellent candidate. In fact, there are quite a few Haskell versions out there (and almost all of them are faster than the version I'm going to write, but this is about an enlightenment process, not an algorithm implementation).

Bits of unrequested and useless background thoroughly mixed with rantings and out of place sarcasm

I also had to reconsider some of my assumptions: I was familiar with Turner's "version" of the sieve and O'Neill's article, which was basically about Turner's version not being a real implementation of the sieve and about not everybody noticing that for like 30 years (and teaching that to student as a good algorithm). I found this story extremely amusing, in the sense that sometimes functional programmers are quite a bit too full of themselves and how cool is their language and overlook simple things.

Turner's algorithm essentially was this:

primesT = 2 : sieve [3,5..]  where
  sieve (p:xs) = p : sieve [x | x<-xs, rem x p /= 0]

And essentially here the point is that to me it does not really look like a sieve, but like a trial division algorithm. Quite amusingly I found a lengthy discussion about that here. The good part is that they rather easily (if you read Haskell as you drink water) derive optimized versions of this algorithm which have decent algorithmic performance and then take some lines in trying to convince us that the code above should not be called naive (which I do not intend as an insult, but as a mere consideration) and that should be regarded as a "specification".

Now I quite understand that declarative languages are excellent tools to write definitions of mathematical stuff (and Haskell is especially good at this), but as far as I know the context in which the one-liner was presented is that of an algorithm, not of a specification.

Essentially this unrequested bit of background is just to retaliate against the world for me not being able to come up with a really fast implementation. Basically the optimized version of Turner's algorithm is quite faster than my monadic versions. Which is fine, as my goal was to get some insight on monads, which I think I did. More on this here.

On the author's unmistakably twisted mind

So... back to our business. I struggled a lot to use Data.Array.ST to implement the sieve. Essentially the point is that I find the monadic do notation quite less clear than the pure functional notation with >>= and >>. This is probably a symptom my brain having been corrupted by maths and myself turning in a soulless robot in human form. Nevertheless, I finished the task but I was so ashamed I buries the code under piles of new git commits.

Essentially, I found mixing and matching different monads (like lists) excruciatingly confusing in do notation. Notice, that was just *my* problem, but I just had to struggle with the types, write outer helper functions to check their type and so on. Basically I was resorting to try and error and much shame will rain on me for that. So I threw away the code and started writing it simply using >>= and >>. Now everything kind of made sense. To me it looked like a dual of CPS, which is something I'm familiar with. The essential point is that the resulting code was:

Hopefully correct.
Written in one go, essentially easily
Rather unreadable

So the resulting code is:

Essentially for (3) I decided it was time to use some do notation. Perhaps things could be improved. What I came out with is a rather slow implementation, but I am rather satisfied. I think it is extremely readable: it retains a very imperative flavor (which is good in the sense that the original algorithm was quite imperative) and I believe could be written and understood even by people not overly familiar with Haskell. It almost has a pseudo-code like quality, wasn't it for some syntactical gimmicks like "$ \idx ->".

Somewhere in the rewrite process I left out the small optimizations regarding running to sqrt(lastPrime) instead of just to lastPrime, but this is not really the point. Still... I'm quite happy because I feel I now have a better grasp of some powerful Haskell concepts.

However, I feel like Clojure macros are greatly missed. Moreover, I found really easier to understand the correspondingly "difficult" concepts of Clojure or Scheme (a part from call/cc which kind of eludes me, think I have to do some explicitly tailored exercises sooner or later).

I also feel like continuously jumping from language to language could lead to problems in the long run, in the sense that if I want to use some language in the real world I probably need more experience with that particular language, otherwise I will just stick with Python because even when sub-optimal I'm so much more skilled in the python platform than in other platforms.

Saturday, August 20, 2011

Excellent Learn You a Haskell for Greater Good

This is the third book explicitly about Haskell I directly buy (a part from things such as Functional Data Structers or Pearls of Functional Algorithm Design), the others being Haskell School of Expression and Real World Haskell, plus some online tutorials and freely available books. I believe they are all excellent books, although with slightly different focus. They come from different ages as well (with HSOE being quite holder).

Back then I enjoyed a lot HSOE, but I think I missed something. Large part of the book used libraries not easily available on the Mac, for example. Moreover, the book did not use large parts of the Haskell standard library which are very useful. For that and other reasons, I did not go on working with Haskell. Real World Haskell has a very practical focus and I quite enjyoed that. Unfortunately, I still remembered too much Haskell not to "jump" the initial parts of the book (and that is usually a bad thing, because you don't get comfortable with the author style before jumping to more elaborate subjects). Moreover, the book is quite massive and I had other stuff to do back then (like graduating).

I did not even want to buy LYHFGG. Afterall, I am a bit skeptical over using Haskell for real world stuff (I prefer more pragmatic languages like Python or Clojure) and so I tried to resist the urge to buy another Haskell book (I could finish RWH, afterall). For a combination of events I did not even remember, I put the book in an amazon order. Understanding a couple of Haskell idioms could improve my clojure, I thought, and I started reading the book in no time.

The first part of the book is very well done but somewhat uninteresting. By the time I started reading it, I had forgotten most Haskell I knew and consequently I read that carefully: however, I made the mistake not to start a small project just to toy with the language. That is the reason I say "somewhat uninteresting": it is very basic, very easy and very clear. Still, I did only remind me of things I knew, without really improving my way of thinking much. Still, the writing was fun and light and I read through it quickly. I consider it a very good introduction to functional programming in Haskell and to functional programming in general and as such the tag line "a beginner guide" is well deserved.

Later in chapter 11 comes the real meat. Functors, Applicative Functors, Monoids and then Monads are presented. The order is excellent: after having learned Applicative Functors, Monads require only a small further understanding steps. Moreover, repeating all the reasoning on Maybe and lists really clarifies the thing. The examples were also in HSOE, but connecting the dots was somewhat left to the reader. This time I did not make the mistake to see monads as just a tool to implement state monad and reached a deeper insight on the subject.

About the book itself, I just love no starch press...