Wednesday, September 22, 2010

Liskov Substitution Principle

The Liskov Substitution Principle was first formulated by Barbara Liskov (and back then was not called “Liskov SP”) in her paper[0] and then in [1]; a more accessible explanation can be found in and [2]. Its influence on object oriented programming is huge and is quite hard to design correct object oriented code in a statically typed language without clear understanding of the principle. To an even more mundane level, this just about the principle of least surprise.

The key idea expressed is that if a property p is provable for every object x of type T and S is a subtype of T, then p should be provable for every object y of type S as well. Since the principle must hold for every conceivable property, we can also say that “functions that use pointer or references to base classes must be able to use pointers or references to derived classes without knowing it” (this is Martin formulation of the principle relative to C++). In Java we would say that methods that use objects of base classes must be able to use objects of derived classes as well.

The circle-ellipse problem is an archetypal violation of the principle. Without any sort of originality I shortly present here the problem, starting with the Rectangle class. I use Python, though I chose to use explicit Java style setter and getters (which is bad practice in Python) and to violate the PEP 8 to make it even more evident this is bad Python. Here the point is starting with Python just because it is easier to use the REPL to try the different problems.

class Rectangle(object):
    def __init__(self, width, height):
        self._width = width
        self._height = height
        
    def setWidth(self, width):
        self.width = width
    
    def getWidth(self):
        return self.width
    
    def setHeight(self, height):
        self.height = height
        
    def getHeight(self):
        return self.height
    
    def getArea(self):
        return self.getHeight() * self.getWidth()    
        

Essentially when we try to derive a Square from a Rectangle, we find out there is no way to do it without violating the principle of least surprise (and in turn the liskov substitution principle). Martin very clearly explains the many ways we can do that and how inevitably we would screw something up.

If we don’t override setHeight and setWidth, we have a Square that is not a Square anymore. If we do override them (and have them set both width and height), we find out that code like:

oldArea = rect.getArea()
rect.setWidth(rect.getWidth()/2)
assert oldArea == rect.getArea() * 2

is doomed to blow up. Thus the LSP is violated. Many conclusions have been drawn. Luckily enough the problem today is somewhat less serious, as long inheritance chains are disappearing from software projects and inheritance freaks are rotting in hell (or at least they should). In fact delegation is a much cheaper (from a conceptual point of view) alternative. Modern IDE’s and nice languages (with meta-programming) make the tedious boiler plate code associated with delegation automatically generated (or better, unnecessary).

It is also worth noting that the problem is there only with mutable objects. Immutable objects don’t have the problem. And I like immutable objects. Although I would hate to memorize both width and height for each square. I would also stress the fact that drawing programs (and other similar stuff) do not need the square abstraction. As people modify shapes, it is very easy that they want to stretch a rectangle to a square. This is an indication that squares are not the right abstraction for the task.

The point now is that we need to use Interfaces. After all it is quite reasonable to have generic code to print the area of every shape. Ah… generic, perhaps generics could help? Yes, C++ templates somewhat could make the problem less relevant. But in Java we still need interfaces. And while using interfaces is the right thing to do (every time that is possible), there is always the risk of interface clutter. Once again, it seems that dynamic typing solves the problem.

If you got an area and that is something I can print, then I will print it. And what about
oldArea = rect.getArea()
rect.setWidth(rect.getWidth()/2)
assert oldArea == rect.getArea() * 2
?

Things seem to get nasty. After all a Square has an area. And perhaps has a getWidth method.

This is reasonable… for example we could have any object know his “box” that is to say width and height of the smallest enclosing rectangle. Simply it would not have a setWidth method. Because we do not have any reason to make a Square inherit from a Rectangle, as simply code that should be able to work with both Rectanges and Squares will be able to do it (remember, dynamic typing).

And if Square also have setWidth? Perhaps we want to change the “box” and have the contained resize. That is reasonable. Then you got me… my dynamic typing won’t save me, uh? Now the things boils down to the assertion.

The point is that “if it walks like a duck and it quacks like a duck, then treat it like a duck”. But you know, Squares do not behave like Rectangles. The point is that while we can imagine lots of pieces of code working with rectangles and squares, since a Square is not a fucking Rectangle it is simply unreasonable to expect that piece of code to work. That’s is. It’s calling that code with a Square that is wrong, not the fact that lot of other pieces of code could just use Squares and Rectangles (and Triangles and everything that makes sense to include in a box – consider the “Enclosable” interface if you just can’t get rid of Java). And come on, it’s 2010, everyone and his mother have unit test.

References

[0] http://portal.acm.org/citation.cfm?id=62141
[1] http://portal.acm.org/citation.cfm?id=197320.197383
[2] http://www.objectmentor.com/resources/articles/lsp.pdf

2 comments:

Valerio said...

Nice post Riko and very very nice conclusion!
I totally got your point!

I found so interesting how you introduce the key concept of Duck Typing starting from the LSP.

Great! ;)

Unknown said...

Thanks! C++ static typing is not the only way of typing. As a consequence we have to refer to the original article (which was very general) in order to understand what it does mean with other typing paradigms.