d.setdefault(key, []).append(w)The key for a word of length n is computed in O(n log n), insertion in the dictionary is O(1) (get the key is O(1) in average and append to the list is O(1)). So the algorithm uses rather efficient elementary operations. At this point "general" questions on the processed wordlist can be answered in O(M) if M is the number of the words (e.g., longest anagram, word with most anagrams). Simply getting the anagrams of a given word is an O(n log n) operation. We are not using an excessive amount of memory. We basically have an hash-table and some "pointers" to the original words. And now the code:
class Anagrams(object): def __init__(self, dictionary): self.dictionary = dictionary self.indexed_words = {} self.preprocess() def make_key(self, word): sorted_word = list(word) sorted_word.sort() return tuple(sorted_word) def preprocess(self): for word in self.dictionary: key = self.make_key(word) self.indexed_words.setdefault(key, []).append(word) def anagram(self, word): key = self.make_key(word) return tuple(self.indexed_words[key]) def anagrams(self): return [anagrams for key, anagrams in self.indexed_words.iteritems() if len(anagrams) > 1]As a matter of fact, if we extend the corpus of words, the required time increases. In fact, using the dictionary from here it took 4 whole seconds. Which is nonetheless a lot less time than some million of years. I did not use that dictionary because it contains lots of misspells which can have sense when computing Levenshtein distances, but are quite a PITA if we compute anagrams (we don't want non existing words to popup as solutions).
Technorati Tags: Efficiency, Optimization, Python, Algorithms, Anagram
2 comments:
For my recursive anagram solver posted 11.11.11 11:11:11
see http://www.daniweb.com/software-development/python/code/393153/multiword-anagrams-by-recursive-generator
Nice!
Post a Comment