Saturday, October 9, 2010

Design of Sorting in Java and Python (pt. 1)

The sorting interface built-in in Java collection model essentially depends on object which are either Comparable or to external comparator objects. The idea is quite straightforward: if the object does not provide a natural ordering (or the developers did not provide one) it is very easy to create a Comparator object.

Such an object takes to objects of type T and returns -1, 0 or 1 depending on whether the first object is "less than", "equal to" or "greater than" the second object. This strategy is extremely flexible (and in fact it is also an example of the very Strategy design pattern) and allows for different orders. For example it is possible to create ordering based on one or more properties and whatever the developers need to.

Of course, this is one of the cases when trying to be a purely object oriented language sucks: we are creating object to do the job of functions. Yes... comparators should not really have any state and just provide a compare "functional" method. However, this i relatively natural in Java. And as Comparable is an interface, we can just provide anonymous comparators like we would have provided anonymous functions.

Collections.sort(lst, new Comparator<Integer>() {
    public int compare(Integer left, Integer right) {
        return new Integer(left * left).compareTo(right * right); 
    }
});

In Python < 2.3 we had quite the same interface. Of course, having higher order functions is great and it was possible to just pass a cmp function. However, now the cmp parameter of sort is deprecated and in Python 3 disappeared... we just can do better than that!

So now, we may want to reflect. For example, let us write the Java code we just presented in Python.

def si_cmp(a, b):
 return cmp(abs(a), abs(b))
 
nums = [-5, -1, -3, 2, 4, 1]
nums.sort(cmp=si_cmp)
print nums

We are switching to python just to get rid of syntax noise. The very same logic does apply to Java. Basically we "lifted" the comparison operator with abs (ok, we should be a bit more rigorous, but the formalization is left to the willing reader).

In fact, this is an extremely common pattern. When we want to compare a pair of Person objects using their name property, we are once again "lifting" the comparison on strings to the comparison on Persons.

class Person(object):
 def __init__(self, name, phone):
  self.name = name
  self.phone = phone
  
 def __repr__(self):
  return '{name=%s, phone=%s}' % (self.name, self.phone)
  
persons = [Person(name, phone) for name, phone in (
   ('John Smith', 'xxxxxxxx'),
   ('Robert White', 'yyyyyyyy'),
   ('Hannibal Lecter', 'zzzzzzzzz'))]
   
def name_cmp(fst, snd):
 return cmp(fst.name, snd.name)
 
persons.sort(cmp=name_cmp)

If we print the persons array the output is (with some formatting licensed regarding whitespace I took in order to better fit in the web page):

[{name=Hannibal Lecter, phone=zzzzzzzzz}, 
 {name=John Smith, phone=xxxxxxxx}, 
 {name=Robert White, phone=yyyyyyyy}]

It may not be apparent that this is an instance of the very same pattern. Perhaps, because we do not really consider accessing an attribute as a function call. Well, think about a getName method (which would be bad practice in Python, but it is exactly what you would have in Java...

class Person(object):
 def __init__(self, name, phone):
  self.name = name
  self.phone = phone
  
 def getName(self):
  return self.name

and then:

def name_cmp(fst, snd):
 return cmp(fst.getName(), snd.getName())

which in Python is exactly equivalent to:
def name_cmp(fst, snd):
 return cmp(Person.getName(fst), Person.getName(snd))

In fact, in Python we prefer using the useful operator module attrgetter.
That function takes the name of an attribute and returns a function that gets that attribute. So operator.attrgetter('name') is a function returning the name attribute of any given object.

import operator as op

def name_cmp(fst, snd):
 get_name = op.attrgetter('name')
 return cmp(get_name(fst), get_name(snd))

More reflections on this to come...

No comments: