RiK0 Tech Temple: November 2010

Friday, November 26, 2010

Slowly abandoning Apple?

There was a time when I used to say most software I use was Cocoa. Consequently, I did not use cross-platform software (a part from CLI stuff). I preferred the OS X only, non portable solution to cross-platform solutions every time the OS X provided a niftier integration with the system.

I was rather shocked considering that it is no longer true. First: I mostly work with non Apple systems (that is to say Windows and Linux). This is not really an issue: I was aware that my job could be done without a Mac. Being mostly a developer, I just need the development tools: compilers, open source interpreters, Eclipse, WingIDE, IntelliJ. All this stuff is cross platform. I also use latex (which is also cross platform). Unfortunately I use Word and the wonderful MathType quite often. Well... once I was very skeptical about the pair, long time ago. I changed my mind, of course. Still don't like word, but MathType is wonderful. JEdit and gvim more than TextMate or BBEdit.

I still use OmniGraffle, and I find it wonderful. However, I think that it sucks not being able to use it on my netbook (my apple laptop has a broken hard drive and I haven't fixed it yet). I mostly dropped Papers for Mendeley (cross-platform and I think more useful). My synchronization needs have increased: I use gmail more often than Mail; Dropbox beats the crap out of MobileMe (which I'm not going to renew).

Etc etc etc. In fact the last bundles of Mac software are not interesting from my point of view. I bought some sw some time ago, and I did not use it. Not that the MacOS is not great, beware. Just I feel like I have to be able to leave the platform with no hassle should the need arise. My MacPro is going to last, so is my laptop. But not forever; and then? I even find some things are better outside the mac (e.g., key shortcuts of IDEs... they are not standardized like on the mac, of course, but the ides just don't work at 100% on the mac... perhaps the new versions are improving).

Just happy I bought 1Password for Windows. Now I just have to leave Yojimbo. Still, no time to export the notes... and the serial numbers. I don't even want to think how much I money I threw away. :(

Yeah... of course, if I had more time to play things would be different. I just love Logic (Express) and doing audio on the Mac. Though, I have not enough time.

Mac, Apple, Leaving Apple, OmniGraffle, MathType

Friday, November 19, 2010

WingIDE 4.x out!

WingIDE 4 beta is available. I'm trying it right now.
Unfortunately I have not an active Django project to test the IDE.

Django, WingIDE, Python, IDE

Thursday, November 18, 2010

Geometry/graphics algorithms: segment intersection

With modern gaming libraries dealing with most geometric/graphics issues, it is probably possible to create simple (and not so simple) games without a clue on the basics of computational geometry. However, the elementary algorithms are really so simple that it is worth to keep them in mind nonetheless.
First, I will briefly review some mathematical concepts we will be using, then I will present the algorithms. In this post I will present algorithms to find if two consecutive segments represent a left or a right turn and to discover whether two segments intersect.
The segment intersection is probably the single most important operation: for example it can be used as a building block to detect collisions in a 2D-game and can be generalized to work with 3D games as well.

Magic of the cross product

For those who thought that they belonged to somewhere else during the geometry classes (or who profoundly hated trigonometry) this will be short. Moreover, we won't use sin and cos since they are extremely slow operations. Nor will we use divisions (as they are both slow and yield to potentially badly conditioned algorithms).

Let A and B two vectors, that is to say two segments with the origin in the origin of the system (0, 0). Their cross product A x B is defined as:

Physics (and mathematicians) would say that A x B is a vector perpendicular to both A and B (thus not contained in the considered 2D plane) according to the right hand rule and with magnitude |x_Ay_B-x_By_A|. However, we are only interested in the sign:

If the cross-product is negative, then A is counterclockwise from B
If the cross-product is 0, then A and B are collinear
If the cross-product is positive, then A is clockwise from B

Our second step is to consider pairs of segments with one end in common. Let s=CA and t=CB (CA and CB should have a bar above, but I can’t do that for typographical reasons) be the segments, where A=( x’s, y’s), B=( x’t, y’t) and C=(x, y).
We know how to compute whether the segment CA is clockwise from the segment CB, that is to say:
As a consequence:

If the cross-product is negative, then CA is counterclockwise from CB
If the cross-product is 0, then CA and CB are collinear
If the cross-product is positive, then CA is clockwise from CB

Turn left or turn right?

Now, given the consecutive segments AB and BC, we want to find out whether going into BC is a left or a right turn. A simple schema clarifies the thing.

Essentially we can compute the cross product ACxAB in order to discover if BC is a left or right turn with respect to AB. This is a very important building block to compute the segment intersection.
We notice how we did not use sin, cos or divisions. We have simply a bunch of multiplications, sums and subtractions. It is even likely that some architecture can perform the determinant of a matrix with a single operation.

Segment intersection

At this stage, we can introduce the code to compute the segment intersection. First I present the function computing the direction between two given segments. Considering that we do not want to make this functions dependent from the specific representation of segments, we just let it take the points A, B and C.
The function uses only multiplications, additions and other elementary inexpensive operations (no divisions, no cos or sin). The basic idea is using the simple function we defined in the last section and eventually deal with the special case when one segment end lies on the other segment.

So, the function is:

def direction(origin, first, second):
    return cross_product(second-origin, first-origin)

provided that we represented points like that:

import collections

def cross_product(lhs, rhs):
    return lhs.x * rhs.y - lhs.y * rhs.x

class Point(collections.namedtuple('Point', 'x y')):
    def __add__(self, other):
        return Point(self.x+other.x, self.y+other.y)

    def __sub__(self, other):
        return Point(self.x-other.x, self.y-other.y)

def direction(origin, first, second):
    return cross_product(second-origin, first-origin)

Notice the use of named tuples here. We have efficiency of tuples and ease of use of regular classes. But this is another story. Then, we have the real thing:

def does_intersect(first_start, first_end, second_start, second_end):
    direction_1 = direction(second_start, second_end, first_start)
    direction_2 = direction(second_start, second_end, first_end)
    direction_3 = direction(first_start, first_end, second_start)
    direction_4 = direction(first_start, first_end, second_end)

    if (direction_1 * direction_2 < 0
        and direction_3 * direction_4 < 0):
        return True
    elif direction_1 == 0 and on_segment(second_start, second_end, first_start):
        return True
    elif direction_2 == 0 and on_segment(second_start, second_end, first_end):
        return True
    elif direction_3 == 0 and on_segment(first_start, first_end, second_start):
        return True
    elif direction_4 == 0 and on_segment(first_start, first_end, second_end):
        return True
    else:
        return False

What do we mean? direction_1 tells us "on which side" the start of the first segment is with respect to the second segment. On the other hand, direction_2 tells us on which side the second segment end lies. If both points are on the same side (they are both left turns or right turns), then the two segments do not intersect. But if the extremes of the first segment are on different side of the second segment and the extremes of the second segment are on different sides of the first segment, then it means that the two segments must have an intersection. The rest of the procedure deals with collinear segments: in this case some extreme of one segment must lie on the other segment. The last procedure is:

def on_segment(origin, first, second):
    if (min(origin.x, first.x) <= second.x <= max(origin.x, first.x)
        and min(origin.y, first.y) <= second.y <= may(origin.y, first.y)):
        return True
    return False

Segment Intersection, Computational Geometry, Python, Algorithms, Geometric Algorithms, Games, Gaming Algorithms

Wednesday, November 17, 2010

My EEE tells me which characters I use most

After a handful of months my EEE shows clearly which characters I use most.
That is to say: E, R, T, I, O, A, S, L, C, N, M and "SPACE".

How I figured out? The keyboard is worn. But not in an unpleasant sense!
It's just that the keys were slightly rugged and now in the center are completely sleek.

Tuesday, November 16, 2010

Interpret escape sequences in Python strings

In Python source files it is possible to define "raw" string literals which mostly do not interpret the escape sequences in the string literal. These string literals are introduced with an "r" prefix, like:

In [2]: s = r'\n'
In [3]: len(s)
Out[3]: 2

As it can be seen that is not the new line character, but a sequence of two characters, '\' and 'n'. String obtained with raw_input have somewhat the same property. For example, if we input \x61, it is not interpreted as the character with code 0x61 (which is 'a'), but as the sequence of 4 bytes '\x61'; that probably is the part saying "raw" input. ;)

If we want to evaluate the escape sequences "as the python compiler does when it encounters a regular python literal (e.g., 'hi\n'), then we have to use the decode method of string objects.

In [17]: s = raw_input(
\x61
In [18]: s
Out[18]: '\\x61'

In [19]: s.decode('string-escape')
Out[19]: 'a'

Python, Escape Sequences, String Escaping

Saturday, November 13, 2010

Have you got enought Fiber?

I recently bought last edition of the Pickaxe and started reading through. The "Fibers, Threads and Processes" chapter caught my attention immediately. I'm not on par with latest ruby features (especially, I have never used 1.9), so I decided to understand what fibers are.

The example in the book closely resembles the one presented in the Pickaxe when explaining Fibers. However, I performed some slight modifications are I prefer "filter" programs, reading from stdin and printing to stdout, than the ones which require a command line argument (or worse, have a fixed constant in the program code). Apparently this strategy is very poorly supported by IDEs but well... shit happens. Afterall I'm already tolerating that the environment is not usable before 2 whole minutes... ah, Netbooks!

Anyway, this is the code:

require 'fiber'
require 'fiber'

words = Fiber.new do
  STDIN.each_line do |line|
    line.scan(/\w+/) do |word|
      Fiber.yield word
    end
  end
  STDIN.close
end

counts = Hash.new(0)
while word = words.resume
  counts[word] += 1
end

counts.keys.sort.each {|k| print "#{k}: #{counts[k]}\n"}

So, the idea is that basically the block passed to the fiber is not executed.
When the resume message is sent to the fiber, the control is transferred inside the fiber. The fiber is executed up to a yield message; at this point the control is transferred back to the code which called the Fiber and the value returned by the resume message are the arguments of the yield message.

Ok, rather easy. This is basically equivalent to Python generators. As it happens most of the times, in Ruby you pass a block so to someone, in Python you define a function.

So I decided to provide a rather faithful Python version:

import re
import sys
import collections

def yield_words(iterable):
    for line in iterable:
        for word in re.finditer('\w+', line):
            yield word.group(0)
    
count = collections.defaultdict(lambda: 0)
for word in yield_words(sys.stdin):
    count[word] += 1
 
for word, occurrencies in count.iteritems():
    print word, ':', occurrencies

The structure is essentially the same. The last component in Python and Ruby is extremely similar. Of course in Ruby you pass a block, in Python you have a for loop on an iterable.

The second component is also quite similar. However, in Python we simply iterate over a for loop. We don't have to call explicitly a "resume" command/method (although we could). I think that the essential difference is that Python generators are iterators. On the other hand Ruby Fibers are something slightly different

I decided to keep the structure of the first block as similar as possible. Chaining Python iterators with functions in functools and intertools, the yield_words function could have been written like:

import itertools as it
import functools as ft

def yield_words(iterable):
    return (word.group(0) for word in
            ft.reduce(it.chain, 
                      (re.finditer('\w+', line) for line in iterable)))

Not really sure that chaining all that stuff increases readability, though. As a side note, all the presented versions are really interactive. If you use stdin instead of piping a file (you were not typing lots of text in, were you?), the program is fully interactive.

That is to say, for example, that is, each time you read a line it is immediately processed, as if the two program parts (the "producer", which asks input and the "consumer" which puts it in the dictionary) were mingled. However, the logical separation is quite evident and the code clarity is greatly improved.

Ruby, Python, generators, lazy computation, Fiber

IntelliJ Ruby Plugin

As I promised... more stuff on Ruby! This is about the installation.
OS X and Linux mostly have Ruby installed. However, you may want to install JRuby as well. On windows, I installed them both.

As a development environment, I choose IntelliJ, which I already use profitably for Java. Probably there are better choices (TextMate on OS X springs to mind, gvim/emacs). However, I want to see what can a "traditional IDE" (and a good one) offer for developing with a very dynamic language like Ruby. Unfortunately Python support does not seem as mature (besides, Ruby plugin is supported by IntelliJ developers and they also offer a Ruby IDE -- whose features should be offered by the Ruby plugin as well --).

As a side note, for Python I already bought WingIDE and I'm happy with that. However, since I don't plan to work with Ruby anytime soon, I only consider free (as free f*ckin' beer) solutions.

Here some screenshots from the setup of a Ruby project

So here some stuff on what the plugin can do. It's lots of interesting stuff.

However, I would like to point out that Ruby is a dynamic language; and the first things you are going to do are simple scripts to automate some repetitive task unless you are going to do Rails development. Since I'm not interested in Rails, I just point out that setting up a Rails environment seems very easy and well supported.

On the other hand I'm mostly interested in the "scripting" part, as it's the first you are going to meet if you don't do rails, like I said.

IRB is well supported. But I don't think its integration with the IDE is sufficient. I can open an IRB console and put Ruby commands inside. Good. A REPL is a very important part of dynamic languages development. I can also load a Ruby script directly in IRB (alt-shift-l -- btw, I suppose it will be a PITA on OS X).

However, since that command only 'requires' the file in IRB, successive invocations have no effect, unless you close and reopen IRB. It would be nice if modified versions of the script could be run without closing and opening IRB. It is something I would expect from a Ruby IDE. E.g., in TextMate it is very easy to do; and at the beginning it is just what you need.

The other options is creating a "runner". Which is easy, there is a menu entry that automagically creates one. This is good to run a script (but remember... you already created a project, and then create runners for every -- logically unrelated -- script in the project). This is a lot of useless work.

Things work slightly better if you just select the text and load (the selected text) in IRB. This way you can run the code more than once. Moreover, you can also have functions defined in IRB and than play with them. I think this is far more useful than 'requiring' stuff, especially because you can't reload.

Ok... the nice things another day. Maybe. However, the "environment" support seems really javish. Not happy. :(

Ruby, IntelliJ, RubyMine, IntelliJ Ruby, Plugin, IDE, dynamic languages, freedom languages

Friday, November 12, 2010

Java and OS X (good news, afterall)

It looks like my concerns were unjustified. OS X and Oracle (well, mostly the open jdk community) are going to collaborate to provide future versions of Java for OS X.

Apple is basically gives its current Java machine to the OpenJDK project. Oracle (and OpenJDK guys) will provide future versions, as it is desirable. Perhaps we will not have JVM lagging behind anymore. Or perhaps we will, but the lag will be shorter.

Java, MacOS X, Apple Support, Java Support, Java on OS X

Pickaxe (almost) for free [Ruby]

Although here in Italy there is lots of talking about Ruby, they are not quite referring to the programming language. However, there are some very interesting news about the programming language as well.

These days the new version of Programming Ruby (also known as the Pickaxe) has a huge discount: both the printed version and the electronic version cost $10. Buying both costs 20$.

I believe that publishers should really reconsider charging extra money for those who want also an electronic version of a book they already bought. They should reconsider this (as I'm afraid piracy will just become a serious problem as ipad/ebook riders and similar device are going to spread). However, in this case I don't care.

Why? 20$ to have a full book (electronic & paper) is a very convenient price and I'm happy with it that way. I just bought them so expect some more ruby related posts in the future. ;)

Ruby, Pickaxe, Programming Ruby

Sunday, November 7, 2010

Java Collections sorting: copy, copy & copy in every sense!

This morning I was hacking into OpenJDK some implementation details. Curiosity, perhaps.

After the announcement that Java is going to use timsort for generic sorting (which is a very good piece of news).
By the way, this is an actual implementation. I'm also wondering why these things are mostly ignored in Algorithm courses, but this is another story.

So... I checked the actual implementation. I had to download the OpenJDK source, as apparently on OS X IntelliJ does not auto-show implementation of base classes when I "jump to implementation". Anyway... the implementation of sorting is basically in Arrays.java.

I want to point out the funny comment relative to sort1 (which is the private function which actually does the job for non floating point stuff):
    /*
     * The code for each of the seven primitive types is largely identical.
     * C'est la vie.
     */

which really saved the day... Yes... it is almost cut and paste for the primitive types (char, int, byte, etc). I was amazed... it is 2010, we have generics but, as primitive types are not objects we have to actually write it 7 times. As a side note, code is recursive quicksort variant.

Ok, nice. You have implementation constraints and you have to cope with that. Yes... C++ templates perfectly solve this crap. But then, there is another very funny discovery (well, in fact I have known this for quite a few time). But that is not the only (technically) unnecessary copy which happen(ed|s) when dealing with sorting in Java; this is the sort function in Collections.java

public static <T extends Comparable<? super T>> void sort(List<T> list) {
Object[] a = list.toArray();
Arrays.sort(a);
ListIterator<T> i = list.listIterator();
for (int j=0; j<a.length; j++) {
i.next();
i.set((T)a[j]);
}
}

Wow! It is clear what it does? For every collection you sort, a copy is made. To be honest, the documentation states this clearly:

This implementation dumps the specified list into an array, sorts
the array, and iterates over the list resetting each element
from the corresponding position in the array. This avoids the
n^2 log(n) performance that would result from attempting
to sort a linked list in place.

Nice! Very nice... this means that the default sort function may be unnecessarily slow. Unless you use arrays (but using arrays in place of collections is considered a bad practice). Moreover, Collection is an interface: its implementation may be such that a full copy into main memory is not feasible. BTW, I should double check because the comments in Arrays.sort state that the algorithm is a modified quicksort, not a modified mergesort.

Oh... nice! Tomorrow I am discussing other methods in Arrays and Collections.

Friday, November 5, 2010

FOAF, crawlers, etc

As part of my research, I have to gather a large number of FOAF profiles. So I looked for some crawlers able to get the profiles (without having to write one myself, which is boring, considering the flexibility of RDF -- and consequently the number of outbound links which could be meaningful).

So... looking for FOAF crawlers (which are semantic crawlers, also called scutters). The first one I found is Futil (funny name choice, BTW). The project page gives some information on the project (quite generic) and has a link to the web svn browsing page. I did not find references to the actual svn repo (but I could easily infer it). Ok. So far so good.

So I examined the list of dependencies the project has:

Python >= 2.4.0
RDFLib >= 2.3.1
PyLucene >= 2.0.0
PySQLite >= 2.3.2
Python-MySQLdb >= 1.2.1
PyXML >= 0.8.4
Python-URL >= 1.0.1

The first dependency is obvious. It's a python project so we need a python environment. And 2.4 is a very low requirement. Good. RDFLib can be easily installed with easy_install, even on windows. Really, easy_install RDFLib

just works! Ok. Then it comes to PyLucene: I found it quite impossible to install on windows. I'm sure I can work it out, but instructions are terrible. There is a binary build project, however, it does not work. At least, not the simple instructions they give. I installed the egg and then:

C:\Users\enrico>python
Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
[1]: import lucene
Traceback (most recent call last):
File "stdin", line 1, in module
File "C:\Python26\lib\site-packages\lucene-2.9.3-py2.6-win32.egg\lucene\__init__.py", line 2, in module
import os, _lucene
ImportError: DLL load failed: The specified module could not be found.

Ok. Things are getting nasty and for no reason. Compiling Python modules on Windows is usually a PITA. Especially because I have Visual Studio 2010 and not Visual Studio 2009 or 2008 or whatever is the default compiler for Python on Windows. Thanks to the little "limited compile yourself dll hell" I'm not going that route (moreover, I'm a gcc guy).

So I tried the futil package without pylucene. Maybe, I thought, something works even without PyLucene. Afterall, Python is a very dynamic language. It is trivial to create a program/library which does not load things it does not use. So I tries just:

python futil-run.py --help<br />

What I got was an error related to not having MySQLdb. What the fuck! I asked for an help message. Nothing more! I need no stinking mysql! Just a bloody help message. Consequently, I suppose that the whole project needs everything installed, even if I don't use it. Which is bad.

I suppose I'll try with slug , written in Java, and see what happens.

FOAF, futil, python, semantic web, crawler, scutter, MySQL, Java

Tuesday, November 2, 2010

Papers: word, ipad, and other stuff

I have been using Papers for something like an year now. I quite like the application, indeed. It is easy to use and rather powerful.

I use it to keep all the articles I read and study for work organised. This way the papers are in the very same directory and I can access them easily from the application in a similar way to what iTunes does for music. I can also add notes and stuff like that.

Papers is also able to perform searches on the web (e.g., using Google Scholar). Then, I can import the papers into the application and possibly categorise them with "groups", assign an "evaluation" (which is rather useful... for example it lets me save time with articles with promising title and abstract but deluding content; this way I don't read them again when looking for information to cite).

Papers (not the application ;) ) have metadata, which is also used for citations. Most of the times (well, not really... sometimes -- but this is not Papers fault as this comes from Scholar or badly formatted papers or whatever) the metadata is accurate and everything is fine. Unfortunately sometimes (which is, once again unfortunately, most of the times) the metadata is not correct. The quickest way I found to fix this is "unmatching" the paper and then searching again selecting some relevant data from the Paper. The interface is very well done and it is easy to understand how it works. In this case, I usually end up with correct metadata ready to be included in a bibliography.

And here it comes the first weak spot of Papers. Unfortunately (or fortunately, depending on the point of view), my group works with Word. As I am the youngest, I simply accept the decision. Consequently, we have to build the bibliography with Word.

Word has a nice bibliography management feature; but it is quite broken. Or I have not found how it works. Word has a built-in bibliographic database. Papers is able to "write" a new database (possibly making a backup of the old one). At this point, word is able to insert citations from the db to the current document. Word has a very limited array of bibliography styles and although more can be downloaded, I found that they do not work very well. Sometimes the generated bibliography is not formatted correctly and it is very hard to make it fit into the requested paper style. At some point I usually end up making it static and then do everything manually.
Perhaps, there are better ways to do it. I have not found them, unfortunately.

This is not necessarily Papers fault. But it is a serious problem. Perhaps I should try different approaches, like going through bibtex or something like that. I remember that when I used to work with latex the bibliography part was quite easy (as long as bibtex was used... when creating bibliographies manually it was a real PITA).

Another very nice feature Paper has is iphone/ipad synchronization. I often read papers on the ipad (and save forests) and that is nice. Essentially if I just remember to synchronize, I have my full library with me all the times. This is dramatically nice. I'm just looking forward for the ipad to allow directly printing files.

The problem is that Papers does not easily synchronise data between two macs. This is awful. Right now I work most of the times with my EEE netbook. However, I would like to have the library on the macpro always in sync with the MacBook Pro's. I read some post requesting the feature, but it seems it has not been added yet. This is a major drawback, especially considering that Mendeley automatically synchronises not only among macs, but among every computer and device (win PCs, Linux ones, android, iphone/ipad and macs).

One of these days I'll share my opinions on Mendeley as well.