The PageRank


In Lecture1 we introduced the MCMC (Markov Chain Monte Carlo) and its most famous representative : the Metropolis algorithm.
In Tutorial 1 we discussed the properties of the transition (or Markov) matrix and the convergence of the MCMC. Here we will see a second and very important application of the transition matrix: the PageRank.

History

The World Wide Web is born in the nineties and quickly became very large: 623 web sites in 1993, one milion in 1997, more than one bilion today.
At the end of the nineties traditional search engines (AltaVista, Yahoo! ...) were focused on the relevance of searched phrase within a document (number of occurrence in the document, their location (title, main text...)...). However spam sites were designed to find their way to the top of search results through techniques like repeating keywords. In 1998 Google proposed a new method to establish the importance of a site, the PageRank. The idea is to use the topological structure of the www -which is an oriented graph- and the properties of the transition matrices.