Faster PageRank

I might be giving away some bright ideas by writing this post, but I don’t care because I don’t know how to make money with this idea. I think there is a way to speed up PageRank enormously. The current PageRank algorithm uses the eigenvector centrality of an arbitrary undirected graph. Those guys at Google make an adjacency matrix of The Internet and calculate its eigenvector. The magnitude of each dimension of the eigenvector is PageRank. The algorithm for converging on the eigenvector is called The Power Method. But this is slower compared to my speed-up for the PageRank.

I propose to use the Floyd’s algorithm (not Warshall’s algorithm) on a matrix where initially all elements (i, j) are set to 1 if there is a direct 1 click link form the ith page to the jth page. All other elements are set to \infty. This is sortaf like the the adjacency matrix.

After applying the Floyd’s algorithm the non \infty elements of the rows are summed up and the row with the least row sum has the highest centrality measure. And therefore the most relevant page.

I am not sure whether this will work when there are cycles in a graph.

The benefits of this method are that PageRank can be done on machines that are not as capable as Google’s MapReduce wielding parallel machines. In fact I am going to ask my employer to try using this for our next generation scientific publishing platform.

Another possible variant is to use the Warshall’s algorithm on the adjacency matrix and select rows with the highest row sum as the most relevant pages.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s