SURNAME_URL = 'http://names.mongabay.com/most_common_surnames.htm' MALENAME_URL = 'http://names.mongabay.com/male_names.htm' FEMNAME_URL = 'http://names.mongabay.com/female_names.htm'I could just get the list of names with these specific lines:
parser = html.parse(url) lines = [el.text for el in parser.xpath('//td/table/tr/td[1]') if el.text]I quite do not like the fact that I'm so dependent on the website structure and perhaps I could look for a site offering the names just as a plain file. But I don't really care enough to look for another website. Moreover, the "way" I get the lines is quite orthogonal from the matter of this post. Now the point is that I want to access those files with a nice API in order to avoid code repetition. And here my brain completely short-circuited. The first idea I got was something like this:
from lxml import html class NamesMeta(type): SURNAME_URL = 'http://names.mongabay.com/most_common_surnames.htm' MALENAME_URL = 'http://names.mongabay.com/male_names.htm' FEMNAME_URL = 'http://names.mongabay.com/female_names.htm' @property def male_names(cls): try: return cls._male_names except AttributeError: cls._male_names = cls._list_of(cls.MALENAME_URL) return cls._male_names @staticmethod def _list_of(url): parser = html.parse(url) return [el.text for el in parser.xpath('//td/table/tr/td[1]') if el.text] class Names(object): __metaclass__ = NamesMetaFor some mysterious reason I liked the idea of accessing the names with:
Names.male_names. While I don't believe it is inherently a bad idea to be able to do it, I do believe it is a bad idea not having an "ask" method. However, I am going to deal with one design problem at a time. Going back to the example, this approach works, but it requires me to duplicate logic for the three properties (male_names, female_names and surnames). And I hate code duplication (and looping "execing" methods is not a good strategy, either). Then there is this thing... I used metaclasses, for no good reason. Well, my idea was to use the "class object as singleton pattern" strategy. This is a very easy way to implement a singleton. Here the first error is that I should really be using Borg/Monostate (or perhaps a bunch of functions...). The code is redundant (if finished) and overly complex. In order to understand how it works you have to be familiar with metaclasses, et all. And for no good reason. Moreover, the @staticmethod points out another clear design problem. As far as the class is only conceived to hide the logic of getting the name lists, it is ok to have the function to get the names in the class interface. However, I believe that we can generalize the class a bit. What if I want to "ask" more things? I could add an interface to add more sources of knowledge. I'm not sure that this counts as "over-generalization" (that is over-engineering). I think that generalizing, in this case should help me to clean the design. And I got to this:
from lxml import html class Failure(Exception): pass def name_getter(question): def _list_of(url): parser = html.parse(url) return [el.text for el in parser.xpath('//td/table/tr/td[1]') if el.text] url_db = dict( male_names='http://names.mongabay.com/male_names.htm', female_names='http://names.mongabay.com/female_names.htm', surnames='http://names.mongabay.com/most_common_surnames.htm' ) try: url = url_db[question] except KeyError: raise Failure return _list_of(url) class Knowledge(object): # (question, fetcher) answers = dict( male_names=name_getter, female_names=name_getter, surnames=name_getter ) def __getattr__(self, key): try: getter = self.answers[key] except KeyError: raise Failure else: value = getter(key) setattr(self, key, value) return valueThis is decent. I can access the "questions" rather easily. I can easily add an interface to query which question the class can answer; the code is rather easy to understand, use and whatever. Moreover, using __getattr__ still has the advantage that the "I have not yet cached the answer" logic is dealt by Object. We will get rid of this, in time. Right now, it is easy to understand and we do not have to write code to deal with it. But then... idiocy struck me again. I felt like being able to instantiate multiple Knowledge classes would be a BAD THING. Which in some sense it is... but hell, it's just a helper script (though the object oriented designer in me is quite a bitch). And since the metaclass strategy worked for the past example, something like:
class KnowledgeMeta(type): # (question, fetcher) answers = dict( male_names=name_getter, female_names=name_getter, surnames=name_getter ) # ... class Knowledge(object): __metaclass__ = KnowledgeMetaworks. But at this point it should be clear that the (object oriented) solution should use the Borg pattern. Identity is a false myth and this is an archetypal example of "only shared state matters". Moreover, it is so appropriate to have a Borg here... we have object that send to the hive-mind whatever they learn. They are borgs! The solution I propose is now:
from lxml import html class Failure(Exception): pass class Borg(object): # Martelli A., Martelli Ravenscroft A., Asher D., # "Python Cookbook", 2nd Edition, O'Reilly _shared_state = {} def __new__(cls, *a, **k): obj = object.__new__(cls, *a, **k) obj.__dict__ = cls._shared_state return obj def __hash__(self): return 42 # randomly chosen number from a dice with one face def __eq__(self, other): try: return self.__dict__ is other.__dict__ except AttributeError: return False def name_getter(question): def _list_of(url): parser = html.parse(url) return [el.text for el in parser.xpath('//td/table/tr/td[1]') if el.text] url_db = dict( male_names='http://names.mongabay.com/male_names.htm', female_names='http://names.mongabay.com/female_names.htm', surnames='http://names.mongabay.com/most_common_surnames.htm' ) try: url = url_db[question] except KeyError: raise Failure return _list_of(url) class Knowledge(Borg): # (question, fetcher) find_answer = dict( male_names=name_getter, female_names=name_getter, surnames=name_getter ) def __getattr__(self, key): try: finder = self.find_answer[key] except KeyError: raise Failure value = finder(key) setattr(self, key, value) return value @property def known_questions(self): sreturn self.find_answers.keys()The idea of using properties instead of method calls is perhaps not very good: it adds no advantage, no clarity. And somewhat hides the fact we may be accessing something on the net. In fact a simple API like:
def answer(question): #... def questions(): # ... def add_question(question, answer_getter): # ...would be perfectly acceptable, usable and clear. However, in the latter solution we would have some global variables; I still believe that an object oriented interface here is better. __getattr__ just simplified some logic, but also masked that we are dealing with caching/memoization: __getattr__ is called iff the key is not find in __dict__ (that is to say in Borg._shared_state), which is exactly what memoization is about. If we want to have an "ask" method we would have to manually deal with the logic of not finding stuff in our cached answers. And I believe an "ask" method would be very useful. For example because it allows for "questions" which are not valid python identifiers. I find much more readable something like:
knowledge.ask('number of dwarves')than:
getattr(knowlege, 'number of dwarves')Luckily enough, memoization patterns are widespread and we can also find ready to use implementations. However, this time, we provide a memoize decorator. This decorator takes the dictionary used as a cache as an argument. Of course, Borg._shared_state is the dictionary we want to use. The decorator is:
def memoize_method(db): def memoizer(method): def aux(self, key): try: value = db[key] except KeyError: value = db[key] = method(self, key) finally: return value return aux return memoizerDefining an "ask" method now is trivial. It should just get the answer finder from the find_answer dictionary, call it, and return the value. Caching is dealt by the decorator. So, eventually:
class Knowledge(Borg): # (question, fetcher) find_answer = dict( male_names=name_getter, female_names=name_getter, surnames=name_getter ) @memoize_method(Borg._shared_state) def ask(self, key): try: finder = self.find_answer[key] except KeyError: raise Failure else: return finder(key) def __getattr__(self, key): return self.ask(key) @property def known_questions(self): return self.find_answer.keys()
Technorati Tags: Python, Design Patterns, Object Oriented, Freedom Languages
No comments:
Post a Comment