Monday, December 13, 2010

Other applications of the ideas behind Page Rank

One of the problem a search engine has to solve is to sort the potentially very long list of matches for a given query with a sensible ordering. Sensible here means that ideally the more interesting results come first.
Many strategies could be applied here (and there is a lot of ongoing research on the subject). Many complex strategies are already applied. However, the basic idea [4] Google used was very simple (as an idea) and very robust.
It is easy to see that the idea was simple. Let r(Pi) be the ranking of page Pi, Li be the set of pages linking to Pi, and |Pi| be the number of outbound links of |Pi| then:

r(Pi) =

Pj ∈ Li 
r(Pj)

|Pj|
(1)
Simple, indeed. With the minor problem that we have no starting point to trigger the computation. Luckily enough, iterative methods come to help. More on this in a few paragraph. I have not yet clarified the "robust" part.
In fact the kinds of ideas underlying the PageRank are rather old. Similar ideas have been spotted in bibliometrics, econometrics and sociometrics. This is not to be considered a defect (lack of novelty), rather I consider it a feature as in "already field-tested".
Using words, PageRank states that a page is important if many important pages link to it. Which is not different from the usual definition of importance of journals, where a journal is important if many important journals contain articles referring to articles in that journal. Well, in the journal world things are easier as there are far less journals than web pages and there is a clear temporal order that can be used. Moreover, articles usually cite only papers written before them (at least until some research has a paper describing his implementation of time-machines, then he could easily refer to articles not yet published).
In 1949 Seeley proposed to "measure" the popularity of a child (he was researching relationships in children social group) considering the popularity of the children which enlisted him among his friends. That is, once again, something like Pi = f(F11,...,F1k), where F1j is a child who listed 1 among his friends. The very same criterion can be applied to recommendations.
Leontief won Nobel Prize for Economy for his ideas on econometrics on the input-output model, which is yet an application of the same underlying ideas. Here I won't give further details. These example come from [1] where further details are given.


I explore further how to compute the PageRank here.

No comments: