RiK0 Tech Temple: September 2010

Wednesday, September 22, 2010

Liskov Substitution Principle

The Liskov Substitution Principle was first formulated by Barbara Liskov (and back then was not called “Liskov SP”) in her paper[0] and then in [1]; a more accessible explanation can be found in and [2]. Its influence on object oriented programming is huge and is quite hard to design correct object oriented code in a statically typed language without clear understanding of the principle. To an even more mundane level, this just about the principle of least surprise.

The key idea expressed is that if a property p is provable for every object x of type T and S is a subtype of T, then p should be provable for every object y of type S as well. Since the principle must hold for every conceivable property, we can also say that “functions that use pointer or references to base classes must be able to use pointers or references to derived classes without knowing it” (this is Martin formulation of the principle relative to C++). In Java we would say that methods that use objects of base classes must be able to use objects of derived classes as well.

The circle-ellipse problem is an archetypal violation of the principle. Without any sort of originality I shortly present here the problem, starting with the Rectangle class. I use Python, though I chose to use explicit Java style setter and getters (which is bad practice in Python) and to violate the PEP 8 to make it even more evident this is bad Python. Here the point is starting with Python just because it is easier to use the REPL to try the different problems.

class Rectangle(object):
    def __init__(self, width, height):
        self._width = width
        self._height = height
        
    def setWidth(self, width):
        self.width = width
    
    def getWidth(self):
        return self.width
    
    def setHeight(self, height):
        self.height = height
        
    def getHeight(self):
        return self.height
    
    def getArea(self):
        return self.getHeight() * self.getWidth()

Essentially when we try to derive a Square from a Rectangle, we find out there is no way to do it without violating the principle of least surprise (and in turn the liskov substitution principle). Martin very clearly explains the many ways we can do that and how inevitably we would screw something up.

If we don’t override setHeight and setWidth, we have a Square that is not a Square anymore. If we do override them (and have them set both width and height), we find out that code like:

oldArea = rect.getArea()
rect.setWidth(rect.getWidth()/2)
assert oldArea == rect.getArea() * 2

is doomed to blow up. Thus the LSP is violated. Many conclusions have been drawn. Luckily enough the problem today is somewhat less serious, as long inheritance chains are disappearing from software projects and inheritance freaks are rotting in hell (or at least they should). In fact delegation is a much cheaper (from a conceptual point of view) alternative. Modern IDE’s and nice languages (with meta-programming) make the tedious boiler plate code associated with delegation automatically generated (or better, unnecessary).

It is also worth noting that the problem is there only with mutable objects. Immutable objects don’t have the problem. And I like immutable objects. Although I would hate to memorize both width and height for each square. I would also stress the fact that drawing programs (and other similar stuff) do not need the square abstraction. As people modify shapes, it is very easy that they want to stretch a rectangle to a square. This is an indication that squares are not the right abstraction for the task.

The point now is that we need to use Interfaces. After all it is quite reasonable to have generic code to print the area of every shape. Ah… generic, perhaps generics could help? Yes, C++ templates somewhat could make the problem less relevant. But in Java we still need interfaces. And while using interfaces is the right thing to do (every time that is possible), there is always the risk of interface clutter. Once again, it seems that dynamic typing solves the problem.

If you got an area and that is something I can print, then I will print it. And what about

oldArea = rect.getArea()
rect.setWidth(rect.getWidth()/2)
assert oldArea == rect.getArea() * 2

?

Things seem to get nasty. After all a Square has an area. And perhaps has a getWidth method.

This is reasonable… for example we could have any object know his “box” that is to say width and height of the smallest enclosing rectangle. Simply it would not have a setWidth method. Because we do not have any reason to make a Square inherit from a Rectangle, as simply code that should be able to work with both Rectanges and Squares will be able to do it (remember, dynamic typing).

And if Square also have setWidth? Perhaps we want to change the “box” and have the contained resize. That is reasonable. Then you got me… my dynamic typing won’t save me, uh? Now the things boils down to the assertion.

The point is that “if it walks like a duck and it quacks like a duck, then treat it like a duck”. But you know, Squares do not behave like Rectangles. The point is that while we can imagine lots of pieces of code working with rectangles and squares, since a Square is not a fucking Rectangle it is simply unreasonable to expect that piece of code to work. That’s is. It’s calling that code with a Square that is wrong, not the fact that lot of other pieces of code could just use Squares and Rectangles (and Triangles and everything that makes sense to include in a box – consider the “Enclosable” interface if you just can’t get rid of Java). And come on, it’s 2010, everyone and his mother have unit test.

References

[0] http://portal.acm.org/citation.cfm?id=62141
[1] http://portal.acm.org/citation.cfm?id=197320.197383
[2] http://www.objectmentor.com/resources/articles/lsp.pdf

Saturday, September 18, 2010

Productivity metrics

Many metric of productivity, code quality and similar stuff have been proposed during the years. Usually each metric has some underlying implication on how software engineering and development should or should not be done. Some are so biased that they seem created only to promote a given procedure. Well, everyone has its jobs. There are people who code, people who tell other people how to code, people who tell other people how to tell other people how to code. The things stops here because even managers understand that a fourth level of indirection is a pure waste of money.

What we know is that good developers are just more productive that bad ones. This was well known even before the whole agile movement came into existence (and even before free software was a buzzworld). In fact I found references on this in Mythical Man Month, between a praise to God and another[0]. Basically Brooks cites some studies which proved that a good developer can be 10 times more productive than a regular one. Of course I would be led to think that a 5 good developer team could achieve much more than a 50 regular developers teams because of the minor impact of communication overhead (especially considering Agile techniques, but now I'm going astray).

Of course, the number of wtf code reviewers utter gives a roughly idea on the quality of examined code... and it works independently on the number of people in the team; it does not matter how many people wrote the cr... code. If it sucks, it sucks. If it sucks it will yield a sensible and continuous stream of wtf. The actual problem is that: i) you actually need a reviewer [which you should have, but too many companies don't]; ii) you actually have to put up recording systems and tie audio files with commits, which would require huge amount of space. Simply counting the number of wtf I don't believe it works, as tone counts. There is a mild wtf and there is an astonished wtf. And there is the angy wtf. And basically you have to develop a very complex system to discern the tone and texture and mood of wtf.

Thus, here we propose the FLOC metric (Fun per Line of Code). That is to say: how much fun the developer (in a sober state) had writing the code. Unless the developer has some facial paresis which makes him look as he was smiling all the time (which is something which luckily affects only a relatively minor percentage of programmers), only a crappy jpeg needs to be associated with the commit. Software which are able to do emotion recognition from face pictures is being actively developed and its not rocket science (actually rocket, have faces, sometimes, but I find extremely bad taste when soldier paint women on bombs).

Please notice I'm half serious: if developers have fun (such as when they program in some nice freedom language) they just write better code. And work longer. Produce more. Crap! That was a secret!

----
[0] I really understand why it is considered the "Bible" of software engineering, it is because the ratio of references to God per sentence is similar to the one we find in the Bible... which I don't think it's a problem in the Bible (since it's about God), but is awkward in software engineering. To understand my feelings, consider this scenario:
[user] Can the bridge support my truck?
[engineer] God bless you.

Friday, September 17, 2010

Security researchers 'destroy' microsoft asp.net security - The Inquirer

May I say I'm just a little bit worried as I probably have sensible data on some ASP based server out there...

Security researchers 'destroy' microsoft asp.net security - The Inquirer

http://threatpost.com/en_us/blogs/new-crypto-attack-affects-millions-aspnet-apps-091310

ok... Java Server Faces has that as well...

http://netifera.com/research/

http://usenix.org/events/woot10/tech/full_papers/Rizzo.pdf

It was better when it was worse (and the RAM...)

I usually deride this kind of arguments. I believe that although specific issue can worsen with time (a part from the obvious ones, such as oil shortage), most things somewhat improve. This is especially true in the computer science/math/science fields. We have better algorithms and better hardware.

My (not so) cheap netbook (which I am using to write this post) would have been a super-computer for the standards of when I started using a computer. I am not only talking about the 14h of autonomy (back then I
don't think the concept of laptop was widespread)... even my crappy Atom CPU would have been a wonder.

I'm usually not very sympathetic with the ones who actually complain about the resources needed to run modern applications (Mind child, when I was your age I had only 640 KB of RAM and I did this and that! -- which in turn must have been generated by some condescending unix hacker or some punched card freak comparing populous with pong when they were young).

In fact their argument is flawed in the sense that modern programs tend to do lots of stuff that their ancestors did not (which can be useless stuff, compare Word 2007 with Word 5.1 for Mac...). Moreover, modern computers have to do lots of stuff older computer did not have to do, operating systems are more complex, we expect stuff like plug and pray to work etc etc etc. It is like complaining why stuff cost less money many years ago: e.g., in "How Blue Can You Get" B.B. King sung about buying his woman a 10 dollar dinner, meant to be a very expensive one (while she thought it was just a snack). By today standards its the price of a fast food meal.

However, there are some constants. Vim is a fast editor. It does not matter: 10 year ago it was a fast editor. Today it is a fast editor (and improved as well the functionality). On the other hand, while Java as a platform improved a lot in the last fifteen years, Java IDEs are always heavyweight. I was there when Eclipse was just a beta, just something more than a proof of concept. And it was slow. And huge. And now... well, it improved, of course... but I have a quad-core MacPro and its barely passable. I let you wonder how it is on the EEE.

By the way, I don't want to criticize too much Eclipse (Idea is much in the same condition....). Yes, I hate the fact that you have to put everything in a workspace. I hate that it is so much project based that you basically have to create projects for everything. But I have to admit that those tools are necessary to make developing in Java bearable (and productive). Moreover, I quite like the idea... small core, everything is a plugin. Reminds me of Emacs; just with less configuration hassle (which is good). Perhaps in another 15 year it will run smoothly on the machines they will have; Emacs took a lot of time to run fluidly on home-computers afterall.

O yeah, and I have to absolutely buy some RAM because otherwise those IDEs are going to make me die of old age while I try to finish my tasks.

Wednesday, September 15, 2010

Generics, Templates and other static typed gimmicks

From my (dynamically typed) point of view both templates and generics are a workaround to solve a problem treacherously introduced by static typing (which is static typing itself). That is to say: loss of generality. Moreover, while some type systems are quite well done (ML), others are really a PITA. I especially hated Java 1.x (with x < 5; see here) for its claims on type-safety and proudly sported static typing, while you had to manually cast back and forth from collections. Back then I quite compared the language with C++ which, with all its defects, at least allowed us to have typed collections. Besides, I am quite fond of template metaprogramming (especially when used for efficiency) and I have an insane passion for the boost library.

Quite unfortunately using simple lambda and bind in C++ increases compilation times from an unspecified amount of time. Basically I resorted to use lambda only in implementation files. Moreover, I found out that most C++ programmers simply find some usages of the templates too hard to understand and that seriously hinders collaboration.

On the other hand, I rather hated the way dynamic polymorphism (inheritance) interacts with static polymorphism (templates). In fact containers “copy” objects inside (which is good, defensive and whatever). However, this means that if you “cannot” put an object of type B inside a container of A’s, if B is a subtype of A. The problem is that you cannot create containers of A& (because stuff you put in most containers can be moved around). Essentially, you have to use pointers or better smart pointers, but that is a pita nonetheless.

Even more tedious is the way inheritance relationships work with templates. Consider the following example:

#include <vector>

class A {

};

class B : public A {

};

void f(std::vector<A  *> const& v) {
// do...
}

int main() {
std::vector<B  *> vb;
f(vb);
}

Given the C++ object model the example must not work, since the fact that B is a subtype of A does not imply that T<B> is a subtype of T<A>. In fact, if it were the case, then Liskov Substitution Principle would be violated; and that would be very bad. Consider this: suppose that B subtype A => T<B> subtype T<A>. Then for the Liskov substitution principle we could use an object of type T<B> wherever we can use an object of type T<A>. However, this is not the case as it’s perfectly reasonable to add an object of type A to a collection of A’s; but it would be plainly wrong to add it to a collection of B’s. By the way, in Java we have somewhat the same problem, though it is mitigated by the use of bounded wildcars.

Saturday, September 11, 2010

What is this Java you are talking about?

The first time I heard about Java must have been around 1995 or so. There were these applets and I was looking for something to program my mac. Back then I was not a Unix user nor had any exposure to C. I remember I looked at some examples and I basically did not even understand what was it all about (and I'm not even sure I had the tools to compile the thing). I hated it. That single episode can have had very far reaching consequences in the developer I'm today. I seriously studied Java later. I think it was Java 1.3 or so (and I hated it again). Then I hated Java 1.4. I felt more comfortable with Java 1.5 and I almost overcome my childhood memories around the times of Java 1.6. Besides I did write some serious code with both Java 1.4 and Java 1.5 a part from hating it. The ones who know me perhaps could have been surprised to know that I do advise to learn Java. And it is not something recent (in fact I foresee my future will be much java-centric); it is something I did in the last 5 years. I love Python. I love functional programming, dynamic typing. I love lots of stuff Java does not have and probably will never have. I hate the static typing Java uses, I hate its threading model, I hate above everything else checked exceptions. So why learning Java? Because if you don't read Java, you miss lots of interesting literature. Clean Code is written thinking about Java (and some issues presented are typical of Java, while others are more universal). Patterns of Enterprise Application Architecture is "written" in Java. And so many other books. Moreover, many developers you meet are proficient in Java and in order to introduce them to any other language concept or feature you better express it in terms of Java. The idea I slowly matured is that Java is slowly replacing C as kind of Lingua Franca in computer science. It does not matter Java has defects (and big ones, too). Many new technologies are created inside the Java platform and while we can discuss if it is a good thing or not, things are going this way. Many things that once would have been done in C, now are done in Java (not that I'm suggesting Java will replace C in the areas where C is strong). A couple of days ago I realized another important truth. I kind of figured it out why the industry is using Java, still I had to understand why the academics do (a part from the joke that they love a language you can't use without classes). And the answer is the same: science is not about how good is the language you use. It is about creating, communicating, sharing knowledge. I could write my next system in clojure (that was exactly what I was planning to do). But then fewer people could understand and appreciate my work, and even fewer could take it and improve it. So... let's hope they will improve Java soon. :)

Friday, September 10, 2010

Itertools and string manipulation

A couple of examples here...

import itertools as it

def verticalize(*strings):
    return (''.join(col)
            for col in it.izip_longest(
                *sorted(strings, key=len, reverse=True),
                fillvalue=''))

def verticalize2(*strings):
    return (''.join(col)
        for col in it.izip_longest(
            *strings, fillvalue=' '))

The first one sorts the string by their length.
As it if often the case when using itertools stuff, that code has a very functional intention.
Here an example:

print '-' * 80
verticalized = verticalize('casa', 'python', 'mela')
for row in verticalized:
    print row

print '-' * 4
verticalized = verticalize2('casa', 'python', 'mela')
for row in verticalized:
    print row

--------------------------------------------------------------------------------
phh
yoo
tum
hse
oe
n
----
hph
oyo
mtu
ehs
 oe
 n

Thursday, September 9, 2010

Does Django really suck?

I remember the days when every programmer and his dog had their own web framework and templating engine (apparently the dogs had issues with cheetahs, as they seemed unable to catch'em). Of course, the widely used ones were a small number, but there was no obvious choice or really dominant solution. Moreover, setting up the development environment was far from instantaneous and often required fiddling with apache configuration file.

In this sense Rails was a real breakthrough (though outside Python world). I quite welcomed Django which brought winning ideas from Rails in the python community and then some. For example, the admin interface was simply a killer feature (from my point of view).

However, Python community was (and is) constituted by very skilled programmers often doing very complex and challenging stuff. Django ORM has been quite criticized (as far as I remember). Lack of multi-valued primary keys was among the first features criticized (especially because SQLAlchemy already offered those features). In general many early critics were more like "I like that other framework better".

I decided to support Django wholeheartedly nonetheless, and I still do. I believe we need a "default" web framework. Something which is widely adopted and maintained and can be pushed in the enterprise world in order to spread Python itself. That is the reason why although I mostly agreed with critics (e.g., I do prefer sqlalchemy) I continued suggesting Django to newcomers and experienced programmers as well.

I recently read this very interesting presentation. That is a "second generation critic" very different from the early critics. It comes from someone who used Django extensively and consequently is well motivated and addresses specific points rather than trying to promote "another web-framework".
I basically agree with every critic. I hated apps from day one. I hate the poor object relational mapping.

I'm a big web.py fan (and luckily, since I am not a web developer, lack of "advanced/enterprisey" features is not an issue). I like simplicity. I hate feature bloat.

Mock graph on software development

I completely agree with the presentation author: Django needs to be fixed, not to be substituted. Even if the temptation of just replacing with some younger framework is strong, from a geeky point of view. Unfortunately, managers are not geeks. Lot of effort has been put into having the enterprises to adopt Django. We should at least try to pursue this road.

Monday, September 6, 2010

Nice functional LCM in Python

I was putting order in my hard disk when I stumbled on this:

from fractions import gcd

print reduce(lambda x, y: (x*y)/gcd(x,y), range(1, 20), 1)

I suppose we should say:

from fractions import gcd

def lcm(numbers):
    return reduce(lambda x, y: (x*y)/gcd(x,y), numbers, 1)

print lcm((3, 4))
print lcm((3, 6))
print lcm((4, 6))
print lcm((3, 6))
print lcm(range(1, 20))

Sunday, September 5, 2010

Multi-scheme implementation of roman: performance

In order to confront the different implementations of simplify-all with multiple scheme variants, I removed all the racket specific code. I implemented by hand missing functions (foldl) or got their implementations otherwise (sort, from Freeley in the gambit lib directory). I also have some measures of different implementations all in Racket.

The resulting program still runs with racket, of course. In fact, I found rather annoying to perform the test as, for example, some implementaion provide an add1 function (so I have to comment out the definition) while some others miss that builtin (and they need my implementation).

Benchmarking code has become:

(define benchmark
  (let ([numbers (build-list 2000 add1)])
    (lambda ()
      (for-each
       (lambda (simplify-all)
         (let* ([integer->roman (integer->roman-builder simplify-all)])
           (time (map integer->roman numbers))))
    simplify-all-versions))))

We tried the code with Larceny (about good names...), Gambit, Racket and Chicken.

Time executions of the programs with different scheme implementations

Essentially on this benchmark Chicken Scheme is the fastest. I compiled the sources with the following options:
csc -inline -block -local -unsafe -fixnum-arithmetic -lambda-lift -inline -disable-interrupts -disable-stack-overflow-checks

The differences in performance among the different programs are negligible. Moreover, with different runs, the relative differences tended to change.

Compiled racket is just a bit slower. Here we can spot that the "imperative" variant is significantly slower. Its slowness is amplified in the non compiled variant. On the other hand, with gambit, it seems that the imperative variant is the fastest. Unfortunately the time delta is so small that it could be a statistical variation: besides, differences are so small they are negligible.

Larceny is the slowest. I suppose I was unable to use the proper runtime options.

I also put in a graph the times spent in garbage collection.

Time spent garbage collecting by the different scheme implementations

Here we notice that most implementations spend comparatively the same time garbage collecting all the different programs. Larceny is the one spending less time (which is surprising, given the slow execution times). Compiled Racket comes second and Chicken comes third.

I believe I missed some compilation optimization with gambit, since to my knowledge it is one of the fastest scheme implementations.

I find the graph for Interpreted Racket the most interesting. In fact, we can see that execution times peaks occur where garbage collection times peaks occur. I put in a graph the difference between the running time and the time spent garbage collecting. This means that it is the time spent in actual computations.

Running time without considering GC

Now that we removed time spent garbage collecting, Program A is on par with other programs with Racket Interpreted. Still, the "imperative version" (program D) remains slower. Nothing really changes for compiled Racket.

Saturday, September 4, 2010

ErgoEmacs

I recently stumbled[0] on the ErgoEmacs project. Well, it looks very promising.
I like the huge set of libraries they packed in and how easy it is the installation process.
If I discovered this one before, it would have saved me lots of time in learning the minimum basis of elisp needed to configure my emacs.

Unfortunately, in the meantime (which means... a lot of years, indeed) when my brains sees an Emacs window, it switches to Emacs mode. That is to say that the easier keybindings ErgoEmacs provides are, for me, very hard to learn. This is rather surprising, since when I use TextMate or BBEdit I don't have no troubles in using the editors (though many Emacs keybindings are recognized).

I don't even have troubles with vim. Why is so hard to use an Emacs which does not behave like Emacs, considering I'm not particularly fond of how Emacs behaves? What did Emacs do to my brain?

Notes

[0] Thanks, Nicola!

Friday, September 3, 2010

Internal Iterators in Python

In my post I started a brief discussion on internal interators and external iterators.
Here I showed how a couple of external iterators can be implemented in Python using generators.

Now I will present a couple of internal iterators in Python performing the same tasks. The user only has to provide the callback.

def dfs(node, action):
    stack = [node, ]
    while stack:
        current_node = stack.pop()
        stack.extend(reversed(current_node.children))
        action(current_node.value)

def bfs(node, action):
    queue = collections.deque((node, ))
    while queue:
        current_node = queue.popleft()
        queue.extend(current_node.children)
        action(current_node.value)


dfs(tree, lambda x: sys.stdout.write("%s, " % x))

The code comes from the presentation I gave at Pycon Italia 4.