Saturday, January 29, 2011

Y Combinator in Clojure

Even if it is hard to explain why, I particularly like the Y combinator. It isn't very practical in most "standard" languages. However, there is something poetic with it.

A couple of years ago I came from long exposure to SML and Haskell. As a consequence, when I started working with Erlang, I really missed the possibility to define everything as an anonymous inner function. I tended to organize my functions with inner sub-functions that often were recursive. As far as I knew (and still know), in Erlang it is rather hard to do this kind of things. First, the syntax is less clear than the Haskell equivalent and second, the "lambda" functions cannot be recursive. As a consequence, I tried using the Y combinator. And no, it is not a particularly good idea, but I enjoyed it.

Some times later, I tried to work with the Y combinator in Python. As Python completely lacks any form of tail call optimization, this is a very bad practice. However, I really liked the idea of using the Y combinator as a decorator... and that is almost as nice as the "apply decorator trick". Well, of course the apply decorator trick has no drawbacks and can be employed in real settings. The Y combinator in Python cannot (besides, the implementation I gave works only with functions with one argument, but the extension to arbitrary functions is trivial).

Back to clojure... I decided to implement Y in clojure as well. Of course, since clojure does not have TCO either, it is just a toy device. Moreover, there is no need for such a tool (as the syntax for nested recursive functions in clojure is nice). However, as I said... I love the Y combinator.

Well... this is the clojure version:

(defn Y [F]
  (let [G (fn [K]
        (F (fn [& A]
         (apply (K K) A))))]
    (G G)))


(def fact
     (Y (fn [f]
      (fn [n]
        (if (<= n 0)
          1
          (* n (f (- n 1))))))))


End of the story (yeah, basically it is like the scheme version). By the way... I'm starting toying with trampolines to see what can be done.


Thursday, January 27, 2011

Cleanup Strategies semantics for different languages

A couple of days ago, I was reading a nice Puzzle in Java Puzzles.

The book is full of interesting information regarding gotchas of the Java language and VM and also contains some useful insight for language developers (in the sense of "get right what Java designers got wrong").

The puzzle was about the return value of a function such as:

boolean f() {
  try {
    return true;
  } finally {
    return false;
  }
}


Of course, the function returns false according to Java specifics. The authors however argued that this kind of programs should be forbidden by language desigers as their semantics is rather counterintuitive. Personally, I'm not sure if a language has to become more complicated to avoid these kind of problems. I rather believe that this kind of stuff is under the responsability of the programmer.

Nonetheless, I decided to see what happened with different languages. Unsurprisingly, in Python it has the same semantics than in Java"

In [4]: def f():
...:     try:
...:         return True
...:     finally:
...:         return False
...:

In [5]: f()
Out[5]: False


Essentially, either you bane such programs (e.g., return is not allowed in a finally clause) or you can't avoid the problems: returning True would be far worse. Ruby does the very same thing:

def f()
  begin
    return true
  ensure
    return false
  end
end

irb(main):020:0% f
false


Essentially, the problem is with finally (well, ensure, in Ruby). On the other hand, if a language does not provide any cleanup mechanism error management is seriously more complicated. In fact, in Ruby it is not uncommon to use blocks for this purpose. In Python, we have the with statement. Considering that the with statement associates a block code with a context and cleanup happens in the __exit__ method of the context object, there is essentially no way that such unsurprising behaviour happens (the return value of the __exit__ block is not the return value of the function).

In [1]: class ContextExample(object):
...:     def __enter__(self):
...:         pass
...:     def __exit__(self, *stuff):
...:         return False
...:
...:
In [4]: def f():
...:     with ContextExample() as c:
...:         return True
...:
...:

In [5]: f()
Out[5]: True


However, before moving to other kind of error management strategies (RAII, with), we should notice that things in clojure are different:

(defn f[]
  (try
    true
    (finally
      (println "FOO")
       false)))

(println (f))


prints "FOO" and then true. Essentially the finally block is executed, but the value returned is not the one of the finally block. Surprising? Not really. In Clojure documentation is crystal-clear on this point:


(try expr* catch-clause* finally-clause?)
catch-clause -> (catch classname name expr*)
finally-clause -> (finally expr*)

The exprs are evaluated and, if no exceptions occur, the value of the last is returned. If an exception occurs and catch clauses are provided, each is examined in turn and the first for which the thrown exception is an instance of the named class is considered a matching catch clause. If there is a matching catch clause, its exprs are evaluated in a context in which name is bound to the thrown exception, and the value of the last is the return value of the function. If there is no matching catch clause, the exception propagates out of the function. Before returning, normally or abnormally, any finally exprs will be evaluated for their side effects.


Considering the Java heritage of Clojure, this may be surprising for Java programmaers (at least those crazy enough to use code resembling the one at the beginning of this page). However, such behaviour is exepected and welcome for Lisp programmers. Indeed the Lisp equivalent of "finally" is the lovely unwind-protect. And the doc of unwind-protect states that:


unwind-protect evaluates protected-form and guarantees that cleanup-forms are executed before unwind-protect exits, whether it terminates normally or is aborted by a control transfer of some kind. unwind-protect is intended to be used to make sure that certain side effects take place after the evaluation of protected-form.


Once again: the cleanup stuff can be used only for side effect. The return value is the one of the main expression.

Tuesday, January 25, 2011

Install Clojure on Windows

I'm not a big fan of the Microsoft platform. I'm not a hater either, though in a binary world I would probably be the second. Luckily enough we have some degree of fuzziness which makes life more fun (besides, I think that programmers in a binary world would feel perfectly comfortable with directly writing opcodes, possibly directly using a magnet on a hard disk, so no Lisp at all).

For a very long list of reasons (which is so long that it would deserve a post on its own, provided that someone but me cared) I'm using windows a lot at work. This has something to do with my macbook having a broken hard disk and me being lazier than a Haskell interpreted with a Haskell interpreter. Anyway...

HasOffice - {Mac} = {Windows}

So... and I'm a lot into experimenting new technologies. My new found subject of experimentation is clojure: after at least an year I'm planning to start using it for some real world apps, eventually the time has come. And unfortunately I have to do it in an hostile environment. I usually find easier to try new stuff on the Mac or on some other Linux systems. I'm more at ease with the OS and usually developers tested and developed the thing on a system rather similar. Especially the installation stages are troublesome, as a working compiler is often assumed present (and usually it should be gcc).

On the other hand Clojure lives in a Java environment. Essentially it follows different rules from this point of view. The runtime is a jar, altready compiled. My IDE just downloads it and things work. I also have clojure box and things work out of the box (a part from a very strange bug which freezes the whole Emacs). Amazingly enough, some years ago I had some troubles with using it CLI on the mac (I think it was version 1.0 and something in the internal class structure changed so my old way of invoking the repl just did not work -- this is solved, thanks brew!).

The only "problem" with installing clojure on windows has to do with putting things in the right places.
First I downloaded the clojure zip from the official site and tested that the jar actually worked (which it does) [ I actually substitute the greater than char with % to have less troubles with escaping].

% java -cp clojure.jar clojure.main
Clojure 1.2.0
user=% (println "foo")
foo
nil
user=% (println "foo")

So, now, it is just a matter of placing it on the path and perhaps use some nocturnal mammal script to improve usability. There is such a script in the help page, however, it needs some modifications.

The modifications basically account for the very first lines in the script. I think that it is better if we just omit the version numbers, so that when upgrading we can just put the new stuff in place of the old. However, the modified version is included afterward.

We create a directory clojure in the root of the file-system (well... Windows "root" C:, so C:\clojure). I assume that all the libraries have been unpacked in the same directory (e.g., Downloads). And now we are going to copy the clojure, clojure contrib and jline jars in the that (C:\clojure) directory.

% mkdir \clojure
% cd clojure-1.2.0
% copy clojure.jar \clojure
1 file(s) copied.
% cd ..\clojure-contrib-1.2.0
% copy target\clojure-contrib-1.2.0.jar \clojure\clojure-contrib.jar
1 file(s) copied.
% cd ..
% copy jline-0_9_5.jar \clojure\jline.jar
1 file(s) copied.
% dir \clojure
Volume in drive C has no label.
Volume Serial Number is 74BF-5A64

Directory of C:\clojure
24/01/2011  11:12     DIR           .
24/01/2011  11:12     DIR           ..
19/08/2010  10:01           477.050 clojure-contrib.jar
19/08/2010  10:06         3.237.168 clojure.jar
24/01/2011  11:05            46.424 jline.jar
3 File(s)      3.760.642 bytes
2 Dir(s)  47.933.681.664 bytes free

I also saved a slightly modified clj.bat script and saved it in \clojure:
:: Setup:
:: 
:: 1. Change the constants to match your paths
:: 2. Put this clj.bat file on your PATH
::
:: Usage:
::
:: clj                           # Starts REPL
:: clj my_script.clj             # Runs the script
:: clj my_script.clj arg1 arg2   # Runs the script with arguments

@echo off

:: Change the following to match your paths
set CLOJURE_DIR=C:\clojure
set CLOJURE_JAR=%CLOJURE_DIR%\clojure.jar
set CONTRIB_JAR=%CLOJURE_DIR%\clojure-contrib.jar
set JLINE_JAR=%CLOJURE_DIR%\jline.jar

if (%1) == () (
:: Start REPL
java -cp .;%JLINE_JAR%;%CLOJURE_JAR%;%CONTRIB_JAR% jline.ConsoleRunner clojure.main
) else (
:: Start some_script.clj
java -cp .;%CLOJURE_JAR%;%CONTRIB_JAR% clojure.main %1 -- %*
)

Then it is just a matter of putting C:\clojure in the path. From the CLI it is just (here % stands for %, not for greater than):

set Path=C:\clojure;%Path%

Or better it can be set from the GUI once and for all:
  1. right-click on "Computer"
  2. choose Properties
  3. choose "Advanced System Settings"
  4. choose the Advanced tab
  5. click "environment variables" button
  6. select the Path line from the System Variables box
  7. add at the beginning C:\clojure; [note the ; at the end]
  8.  click ok
The Paths variable is a sequence of ; separated directories, that's it. I'm also considering adding C:\clojure\clojure.jar in the CLASSPATH variable.

Sunday, January 23, 2011

Real life clojure (part 2.)

In this post I started investigating interoperability between Clojure and Java in the setting of my IDE of choice (IntelliJ). We all agree that the compile.clj strategy is somewhat sub-optimal.

It is easy to set-up and works with every IDE, that is positive. However, we have to keep track of the clojure scripts manually. This is true only partially: it is not overly hard to come up with some automated stuff. For example, considering this library and the walk function, it would be very easy.

However, this is the direction of rolling up your own build-system for clojure. This is highly sub-optimal, in my opinion. As far as I know there are at least two with some community acceptance (cake and leiningen). Seems clojure community likes github < bitbucket.

So, leaving out the leiningen solution, I have to work it out with my IDE. IntelliJ has a nice feature which essentially generates the ant files necessary to build the project.

I wrote (well, copied/adapted from Bob Martin post) this additional buildfile.



Martin put the compile target inside another project, I created a project just to build clojure targets.
Then I have just to include this in the main module and somewhat place it among the dependencies (with special regards to having clojure compiled before/after Java, depending on your necessities).

Up to now we essentially have an ant based (IDE independent) build script for our project, which neither depends on cake or leiningen, neither is the beginning of the nth clojure build-system.

Moreover, it can be integrated with the IDE. The idea is just to add the ant action building clojure to the rest of the IDE build-system.


The funniest thing is that today (after some days from the original experimentations) I discovered that IntelliJ clojure plugin has a feature that should solve my problems.

 I guess that with little fiddling with the options I should automagically get the desired behaviour.

, , ,

Friday, January 21, 2011

Real Life Clojure?

This morning I had my first "real world" clojure script working. I have to say, that I'm doing this in the worse possible scenario.
  • I'm working with Java libraries under heavy development
  • My own project is partly in clojure and partly in Java: specifically I both call Java from Clojure and Clojure from Java
  • The framework I'm working with (which is the one under heavy development) heavily relies on dynamic loading and such
  • I never grokked Java build/deploy models; that is to say, I was doing what I strongly advise people against: I don't know the build tools I'm using.
Right now, I did not fully automatize the build system. That is to say, I don't have a nice "build" button on the IDE which does everything for me. Following Uncle Bob's advice I have a compile.clj like:

(set! *compile-path* "../out/production/ModuleName")
(compile 'it.unipr.aotlab.netsym.actors.NodeActivator)


This heavily depends on where IntelliJ puts the compiled files. However, this way, using the right preamble in the clojure programs, everything works:

(ns it.unipr.aotlab.netsym.actors.NodeActivator
  (:gen-class
   :extends it.unipr.aotlab.hds.application.process.AbstractActor
   :state state
   :init init)
  (:import [it.unipr.aotlab.hds.application.process AbstractActor]
           [it.unipr.aotlab.hds.model.application Message]
           [it.unipr.aotlab.netsym.messages Tick]))

I rather hate:
  1. such verbose boilerplate code (apparently LaClojure does not "auto import" stuff)
  2. Explicit compilation stages (that is to say, I found barely acceptable pure-java "run" button which compiled and then executed the module, having to compile clojure, than compile and run Java is heavily sub-optimal)
I'm afraid I can't do a thing with 1. However, I expect I should improve things for 2. For example, I'm exploring the possibility of using Leinigen (tutorial here). Not sure it is the right tool though... I have not a clear idea on how much Java and how much clojure it will be. The basic library is very well crafted with a strong functional lispish bias in it and I like it.
The other road I could take is using ant... I started writing the ant scripts, but that is rather embarassing. However I'm kind of figuring it out how things work (wondering how is verbose XML... I believe that XML designers would definitely not pass the turing test). Uff... the "build using ant" feature of IntelliJ is not where I'm expecting it, though.

I have made some updates here.



Thursday, January 20, 2011

The Game of Life

When Conway introduced "the Game of Life" in the mathematical column of Scientific American it immediately had an enormous success. The game itself is simple: there is an infinite grid of cells and cells live or die according to a bunch of simple rules.

The game roots are far less than trivial, as in those times Conway was interested in a problem which is still actual today (introduced by von Neuman, according to Wikipedia). The idea was building a machine able to copy itself (in which von Neuman succeeded, but Life is just simpler). Later, the game was proved to have the same computational power of a Turing machine. Anyway, the rules are:
  • if a living cell has just two living neighbours, it dies.
  • if a living cell has two or three living neighbours, it remains alive
  • if a living cell has four or more living neighbours, it dies.
  • if a dead cell has exactly three living neighbours, it becomes alive
Every self-respecting computer geek has tried to implement the game at least once. Indeed, the variant usually implemented is placed on a thorus (that is to say, finite width, finite height, wraparound). This greatly simplifies the implementation (which becomes trivial) and holds much of the nice properties "game-wise".

I have no idea how many times I implemented the thing in different languages. Indeed, it is a very nice problem to implement idiomatically. I have to admit that I still have the idea of proposing this as an exercise for the programming lab of a course here at the University. Still, I'm not sure students would find it as amusing as I do and perhaps it would be just a hard exercise for them (depending on how much exercise they did on holiday). So I decided not to give it. Moreover, they would have to do it in C++, which is far less amusing than doing it in Lisp (the last language in which I did it).

I somewhat feel lot of freedom in Lisp. I don't feel I'm forced to a careful and formal object oriented design: things usually feel better if they are simpler. And usually code remains decently organized nonetheless. For example, in an OOL the "board topology" would probably be a class. From another point of view, there is nothing suggesting that a very simple GOL implementation needs such generality. Probably, it we would like to have different boards (especially considering questions about very large boards and efficiently storing and manipulating data) different visiting strategies (cell for cell) should be used. In fact just iterating over the alive cells, does not account for the 4th rule. Detecting a cluster of three alive cells (starting from an alive cell) around the dead one may not be efficient. In fact, I have not thought about the algorithms that should be used in those cases, not I wanted to. I did not want to develop an industrial strong GOL implementation: I just wanted to toy with Lisp. Over-engineering is just around the corner. Beware!

By the way, here the toy implementation.

Tuesday, January 18, 2011

Why learning not-so-mainstream-languages pays

I was reading Ruby ng and I stumbled in this. Basically the article says that in Chicago there are far less Ruby programmers than the industry requires. Which basically make it rather easy to get a job (if you know Ruby). Companies are even trying to relocate people and to teach Java/.Net developers Ruby (well, scout the good ones and teach them Ruby: bad developers would remain bad, perhaps in freedom languages they would look even worse by comparison).

This is happening right now in Chicago. But I continuously hear about such stories: knowing less widespread and niche technologies is a huge competitive advantage, even for newcomers. On the other hand, competing with thousands of developers of more mainstream languages is much harder and often leads to less interesting jobs.


Sunday, January 16, 2011

Different complexities: Lisp and C++

I could not think to two more different languages than C++ and Lisp. They are different in almost all relevant aspects. Even the underlying philosophies are completely different. The culture clash between Lispers and C++ users is well epitomized by the famous (perhaps notorious) post of the great Naggum.

Some quotes from one the post are:
  • C++ is a language strongly optimized for liars and people who go by guesswork and ignorance. 
  • ... it's just that in C++ and the like, you don't trust anybody, and in CLOS you basically trust everybody. The practical result is that thieves and bums use C++ and nice people use CLOS.
And a quote I would have missed were not for wikiquote is
  • I believe C++ instills fear in programmers, fear that the interaction of some details causes unpredictable results. Its unmanageable complexity has spawned more fear-preventing tools than any other language, but the solution should have been to create and use a language that does not overload the whole goddamn human brain with irrelevant details. 
Now, Naggum's post are usually extreme and it is easy to get offended. However, were not for the post I linked (and for a certain Python guru which introduced me to Python) I could probably have my mind crippled by C++. Not that C++ cripples the mind; exclusive exposure to just one language and strong belief that it is the best in the world[0] does it.

Dropping Naggum's extremes, it is true that C++ is about coercion and mistrust as Lisp (and Python) are about freedom and trust. Notice that supporter of different languages disagree on whether freedom is a good thing, after all. Ask an Ada programmer and he'll have you jailed just for thinking about certain things.

Then I asked myself: there is something C++ and Lisp share? And yes, there is. They have huge manuals. I'm quite a Lisp noob and for certain I miss many opportunities. No single course could cover all the stuff (the reference is some eight hundred pages alone!). Every time I look at some article, different book or manual, I usually find some feature I did not know about (and which usually I like it a lot).

I consider myself rather skilled in C++ and, although it is the single language I forget more easily[1] when I do not use it a lot[2]. However, I often speak with inexperienced C++ programmers and I find out they often are missing lots of language and library features.

There is a difference, though. My Lisp code is correct. Perhaps some experienced Lisp hacker could argue that not using some feature makes the code longer or less readable, and I agree, being a fan of idiomatic code. However, the code is not wrong, neither is "Python in Lisp" or something like that. I know the functional model quite well and I can always apply it.

In other words, my inexperience is shown in choosing a longer route, not a wrong one. On the other hand, from my experience, inexperienced C++ programmers just write wrong and inefficient[3] programs; but let us ignore the inefficient thing. They are wrong or in the best situation just error prone. Since it is very hard to design interfaces correctly, I often see implementation leak in the interface or internal handler somewhat exposed (which basically blows the whole encapsulation thing and disables the painful private/public discipline with an orange toxic fart). And even if they are exposed correctly (e.g., iterators) they are always risky. I believe every C++ programmer (included myself) has at least once used an iterator which has been invalidated and had nightmares about it ever since.

Ignoring safe pointers (auto_ptr, shared_ptr) is a good way to write buggy software. In fact, auto_ptr is also a good way to write buggy software: it's semantic is just quite counter intuitive. On the other hand, shared_ptr has hidden costs and should be used with care. Then there is scoped_ptr... which embraces fully C++ design philosophy: if you are using it wrong (copying it) you won't even compile. In a certain sense the concept of "reference" to an object is expressed in 5 slightly different ways (7, if we put the things to hold arrays as well). Not using them correctly leads to incorrect or bug prone code.

Basically, Meyers/Sutter's excellent books are not there to improve C++ code in the sense that programmers can live without them and simply write less enlightened code. If they don't read[4] them (or similar books) their C++ is going to be a vivarium for bugs. Perhaps if some of them is very clever and very experienced (in other languages), she may just see them coming and "rediscover" some of the recipes all alone. Ok, that's fine, just harder. On the other hand in Lisp we are just "missing" the tenth consecutive cool feature.


----
[0] this applies to Lisp and Python as well
[1] this means something, doesn't it?
[2] and I'm happy about it
[3] I believe that if you use a language that traded everything, common sense included, for speed and you write slow code, you have seriously missed the point.
[4] I take for granted the whole read/understand thing

Thursday, January 13, 2011

Native Compiling Common Lisp: Buildapp and ECL

This is a very nice tool which packs a whole Lisp project (for SBCL) with SBCL in a single executable file.

http://www.xach.com/lisp/buildapp/

Ok... some considerations: first, I rather hate the idea of needing to package things that way. The world is not Windows, luckily enough. However, it has its uses. Like giving the program to someone who does not have common lisp installed. Second, the whole thing becomes well above 40 MB. It appears that this is a well known problem with SBCL.

Now, I should absolutely try ECLHere a short of a tutorial.
And an example from my sources; it looks like we are starting to reason:

% ecl
ECL (Embeddable Common-Lisp) 10.4.1
;; snip
> (compile-file "life.lisp" :system-p t)
;; output
> (c:build-program "life" :lisp-files '("life.o"))
;; output
#P"life"
% ls -lh life
-rwxr-xr-x  1 enrico  staff    16K Jan 13 00:07 life

This looks reasonable. Perhaps some Lisp commercial compiler can do better. And I have just forced myself to stop translating the thing in Scheme to see how Gambit fares. Maybe one of these days some benchmarks. Maybe not.

However, I keep asking myself: when we will stop asking ourselves to build standalone executables? I am not even sure it makes deployment easier. For sure it is not the way Java undertook: as far as I know jars are distributed, but the bulk of the JVM is not bundled with the application. When we are dealing with desktop applications, a .app or .exe or 755 is nice to have, but it does not to be a standalone executable, IMO.

I don't even want to know how many years it is since someone invented shared libraries. The single executable route seems so wrong. Right now the only times I had to do that was to give a program to my girlfriend (with Lisp, Python she has installed) or to some non-geek friends. I think that happened like 3 times in the last 7 years. I am even more in the who cares position. However, it is nice to have. Sure I'm missing something, though.


, ,

Tuesday, January 11, 2011

Python Language of the Year 2010

Good news[0] for the Python developers!
Tiobe elected Python language of the year 2010

Programming language Python has become programming language of 2010. This award is given to the programming language that gained most market share in 2010. Python grew 1.81% since January 2010. This is a bit more than runner up Objective-C (+1.63%). Objective-C was favorite for the title for a long time thanks to the popularity of Apple's iPhone and iPad platforms. However, it lost too much popularity the last couples of months.

Python has become the "de facto" standard in system scripting (being a successor of Perl in this), but it is used for much more different types of application areas nowadays. Python is for instance very popular among web developers, especially in combination with the Django framework. Since Python is easy to learn, more and more universities are using Python to teach programming languages.

So... nice to see. Python was language of the year 2007 as well (that was the year Python surpassed Perl). This year Python surpassed Visual Basic and C# and is the fifth more popular language in the world (according to TIOBE).

Although there is no ground to make the following supposition, if Python, C++ and PHP would carry on 2010 trend, Python would become the third most popular language surpassing both of them (PHP decreased more than Python increased, and C++ is decreasing as well). Of course there is no reason why the trend should remain the same (though perhaps developers are eventually starting to understand how much PHP sucks).

By the way, this is the list of TIOBE's languages of the year in the last few years. Looks like Python is the only one to have won in two different years, up to now. Next year? Any ideas?

2010
Python

2009

Go

2008

C

2007

Python

2006

Ruby

2005

Java

2004

PHP

2003

C++

By the way: even if Tiobe is worth something, I don't really care. Some of the most interesting languages out there (Lisp, Erlang, Haskell, Scheme) got very low positions. This, together with Java poll position, rules out the Tiobe index as a measure of language quality. By the way... Lisp has a huge increase. Anything to do with Land of Lisp?


The TIOBE Jan 2011 Index




----
[0] well, this depends if you believe Tiobe index is worth something

Saturday, January 8, 2011

PyCharm autocompletion

I have not yet figured out how it works and to what extent. In fact I have used it just for one file projects, so I don't really know whether it works across file boundaries. But look at this:



Fairly impressive, huh? There are no user provided type hints. Unfortunately enough, it does not look like the x = f(B()) statement is taken into account: simply the IDE guesses that since I have a class named B a parameter named b may be an instance of b and suggests B.bar as a possible completion. Ok, this is possibly overly optimistic, however, in my experience I've found quite often that this is useful (at least, better than nothing).

In the following screen-shot, I demonstrate how the good old assert isinstance trick we used in WingIDE still works:



Still, I'm not extremely happy with this strategy as it mimics too much static typing. In fact if I wanted to specify types for my parameters, I'd use Java. No doubt that writing

bool f(B x) { }

is faster than:

def f(x):
    assert isinstance(x, B)

I admit... I don't really like the trick. However, sometimes it is not overly restrictive do such typechecking: e.g., numbers are expected to inherit from the proper abstract base classes and when using libraries such as numpy, it is rather safe to assume that some parameters are numpy arrays.

The point where this strategy fails blatantly is with return types. Having to add type assertions to function return types is horrible, basically infeasible. Here Pycharm is smart and relies on a convention from the standard library, consider for example:

vars(...)
    vars([object]) -> dictionary

    Without arguments, equivalent to locals().
    With an argument, equivalent to object.__dict__.

We can see that here we somewhat suggest (in the documentation) the return type of the function. Well, since they implemented that for standard library code, they are letting developers using it as well. PyCharm developers also promised to add similar conventions based on other standards (e.g., Sphynx)



Unfortunately the types of the input parameters are not considered.



Thursday, January 6, 2011

Evolutionary Game in Python

A couple of nights ago, after reading like the 10th consecutive network analysis paper, I absolutely needed to do something different. Consequently I grabbed my copy of land of lisp and decided to port chapter 10 game to Python.

The game is easy to understand. It starts with a single animal moving in the world. The animal is able to reproduce (alone, no sex, it's family friendly! -- well don't know) introducing mutations. At each stage each animal chooses a direction, moves there, if there are plants then it eats them (gaining energy) and if has enough energy spawns a child. New plants are grown at each stage: one in a random spot of the central "jungle" area and one in a random spot in the whole board (possibly in the jungle). When the animal finishes energy, dies.

The original lisp code is very loop oriented, with massive use of global mutated variables. As a consequence I thought that a Python translation would be very easy, and in fact, it was. The "sensible" spots are where I expected them to be. For example here:

(defun turn (animal)
(let ((x (random (apply #'+ (animal-genes animal)))))
(labels ((angle (genes x)
(let ((xnu (- x (car genes))))
(if (< xnu 0)
0
(1+ (angle (cdr genes) xnu))))))
(setf (animal-dir animal)
(mod (+ (animal-dir animal) (angle (animal-genes animal) x)) 8)))))

For those who are not fluent in Lisp, in the "labels" form the recursive function angle is defined and that is used afterwards. In the first version of the code, I did a rather faithful translation (and the resulting code was very unpythonic).

def turn(self):
    # inefficient and unpythonic, rewrite!
    x = random.randint(0, sum(self.genes)-1)
    def angle(index, x):
        xnu = x - self.genes[index]
        if xnu < 0:
            return 0
        else:
            return 1 + angle(index+1, xnu)
    self.dir += angle(0, x)
    self.dir %= 8
Now, the code is horrible, in my opinion. And it was late night... I convinced myself that the gratuitous use of recursion should not impact too much on the execution times: after all it can't blow the stack as it is on exactly 8 elements. I was quite wrong. I run the profiler on the file (-mprofile) and let it run just 5000 iterative evolutions. The total time is 75.698 seconds. Please, consider that the profiler slows things a lot. I am going to give performance impressions later. Here I report some snippets of the full profile output.
781884 2.911 0.000 2.911 0.000 :0(random)
755960 2.791 0.000 2.791 0.000 :0(sum)
65076/2958 1.482 0.000 4.095 0.001 copy.py:145(deepcopy)
755960 3.482 0.000 3.482 0.000 isle.py:31(move)
755960 10.859 0.000 51.522 0.000 isle.py:44(turn)
3528217/755960 21.310 0.000 21.310 0.000 isle.py:47(angle)
755960 2.488 0.000 2.517 0.000 isle.py:56(eat)
755960 2.555 0.000 6.802 0.000 isle.py:61(reproduce)
5000 10.540 0.002 75.516 0.015 isle.py:73(update_world)
1 0.000 0.000 75.698 75.698 profile:0(execfile('isle.py'))
781884 7.859 0.000 10.770 0.000 random.py:160(randrange)
781884 6.353 0.000 17.123 0.000 random.py:224(randint)
Here we notice that calls to random & co. take a huge amount of time. But that was to be expected. Here the point is that turn and in turn (pun intended) angle take a huge amount of time. The code could be rewritten in a much simpler way, without the inner recursive function. Here turn is responsible for 2/3 of the time: if we could just reduce the time of an order of magnitude (that is, place the function in the same order of the other methods of Animal) we could achieve acceptable performance. However, even without the profiler Python proves quite slow. For example lets time 10000 evolutionary iterations:
% time python isle.py << EOF
10000
quit
EOF
13.04s user 0.02s system 99% cpu 13.070 total
Notice the trick with << EOF to let us automate user input (I love Unix). Let us compare it with the original lisp implementation. With clisp the performance are less than spectacular to say the least:
% time clisp << EOF<br />(load "evolution.lisp")<br />(evolution)<br />10000<br />quit<br />EOF<br />clisp <<< '(load "evolution.lisp") (evolution) 10000 quit'  <br />57.54s user 4.96s system 98% cpu 1:03.31 total<br /><br />
Probably I could pre-compile the stuff and improve performance. No time right now, moreover I was more interested in SBCL anyway. However, with SBCL it is impressive:
% time sbcl --noinform --load evolution.lisp --eval '(evolution)' << EOF
10000
quit
EOF
0.90s user 0.10s system 99% cpu 1.011 total
That is like 13 times faster than the Python version. I'm sure I can optimize the Python program. For sure, I expect to cut down the execution time of a factor of at least 2 fixing the bloody recursive function. And about what to optimize afterwards, bets are open. On the other hand, the careful reader has surely noticed that turn is a method: in fact I decided to transform Animal in a class. I believe that it is a mistake (that is to say, either transform the full program in an object oriented way or let the whole thing without objects). For example the "reproduce" method *sucks* to a very large extent:
def turn(self):
    # inefficient and unpythonic, rewrite!
    x = random.randint(0, sum(self.genes)-1)
    def angle(index, x):
        xnu = x - self.genes[index]
        if xnu < 0:
            return 0
        else:
            return 1 + angle(index+1, xnu)
    self.dir += angle(0, x)
    self.dir %= 8
It is clear that the method should return the copy instead of adding to the global ANIMALS list.  Anyway... I will deal with object orientation of that code in a later post. My opinion is that the lisp code which I translated is not very lispish, but I like a lot the idea of writing the program quickly. This is the classical example where one risks to over-engineer the simple program, which does its job and is really funny. I loved looking at the black and white terminal with all my animals running around trying to survive! Still, I believe that writing the thing in Python in the first place would have yielded very different code, as it would have done doing it in Haskell or even Scheme. I believe that the example was more on the procedural/imperative aspects of lisp than on writing functional (or OO) code... Full code here.

, , , ,

Tuesday, January 4, 2011

Seven Languages in Seven Weeks

Today I eventually received my own copy of "Seven Languages in Seven Weeks". To be sincere, I started reading the electronic version some time ago (Pragmatic Programmers had a huge discount some time ago, just after the "Programming Ruby" at 10$).

Seven Languages in Seven Weeks


I'm not going to criticize the language choice. In fact, I think that the languages are a wise selection (Ruby, Io, Scala, Clojure, Prolog, Haskell, Erlang). They are all interesting languages and have something other languages do not have. Although I'm quite into Python, I'm not going to complain for its exclusion: the author is far more confident with Ruby (as far as he tells) and that means that his explanations are surely more accurate.

I love the idea of the seven languages in seven weeks: beware, this is not a "Teach yourself C++ in 21 bloody hours". The author is well aware that you could not possibly learn seven languages in seven weeks, but you can understand what those languages are about in seven weeks. I'm thinking about the potential this book has when read by people grown up in closed Microsoft or Java contexts (or perhaps even C and C++). If after reading this book he is still convinced that all the languages are the same he is either a compiler or a very distract reader (and probably hasn't understood a thing about the book).

Just seeing the languages is a wonderful enlightening experience. Realizing how much useless boilerplate compiler-friendly code some languages force developers to read is the first step towards better ways of programming.

And what about me? Well... I already knew 5 of the 7 languages quite well. I have known Ruby and Haskell for 5 years or so, a couple of years later I learned Erlang as well. Besides, I also wrote some thousand of lines in Prolog for my BSc and my MSc dissertation was on logic programming (specifically on building an efficient "interpreter" for a given logic programming language not too dissimilar from Prolog itself). I started studying Clojure basically when the word of mouth started spreading, which makes it about 2 good years. So what's it for me? For example, I am sorry to say that I found the part on Prolog rather unsatisfactory, but hey... I don't think that the goal of the book is to teach me Prolog. However, it gave me a pleasant introduction to Io (which I would have probably never learned) and convinced my that my first impressions on Scala were wrong and perhaps I have to give it a second chance.

So definitively worth reading. And here a free chapter!



Sunday, January 2, 2011

Lexical Scope vs. Dynamic Scope

Most languages have lexical scope, nowadays. This is good news, as it is easier to use. In fact, many programmers do not even know the difference between lexical and dynamic scopes, leaving the distinction to programming language junkies.

In languages with lexical scope a variable name is resolved to the object with the same name in the nearest lexical environment. In other words, the compiler chooses the innermost object considering local variables, function parameters, enclosing function variables, global variables and so on. If it is not clear, by the description I gave (which is not clear, I admit) just think to C or Java or Python. That is lexical scope. It is also called static scope, as it can be statically determined with just the source and no runtime information.

It is good because the one who writes the function knows the environment where it is defined (and that is what matters), while the one who calls it does not need to know anything about it:  just to pass the parameters. On the other hand, with dynamic scope functions access the object with a given name that is *nearer* on the call stack. That is to say... the caller environment influences the function behaviour.

Early Lisp dialects have this strategy, as well as Emacs Lisp. Luckily, Common Lisp default mode is lexical scope (but dynamic scope can be used as well). Perl has dynamic scope (with optional "my") static scope.

A good example of the two different strategies is this snippet:

(let 
((f (let ((x 1))
(lambda () x)))
(x 0))
(funcall f))


If that snippet is evaluated in emacs *scratch* buffer (C-j) the result is 0. This is because when f is executed, the x nearest in the call stack is the one labeling 0. On the other hand a common lisp compiler would evaluate the thing to 1, as the x referenced in the lambda function is the one in the lexical closure.

* (let 
    ((f (let ((x 1))
          (lambda () x)))
     (x 0))
  (funcall f))
; in: LAMBDA NIL
;     (LET ((F
;            (LET (#)
;              (LAMBDA # X)))
;           (X 0))
;       (FUNCALL F))
; 
; caught STYLE-WARNING:
;   The variable X is defined but never used.
; 
; compilation unit finished
;   caught 1 STYLE-WARNING condition

1


, ,