ApproxRank: Estimating Rank for a Subgraph

TitleApproxRank: Estimating Rank for a Subgraph
Publication TypeConference Papers
Year of Publication2009
AuthorsWu Y, Raschid L
Conference NameIEEE 25th International Conference on Data Engineering, 2009. ICDE '09
Date Published2009/04/29/March
ISBN Number978-1-4244-3422-0
KeywordsApplication software, ApproxRank, Computer applications, Crawlers, customized semantic query answering, Data engineering, Educational institutions, Explosions, focused crawlers, global Web graph, graph theory, IdealRank algorithm, Internet, localized search engines, PageRank-style, personalized search, Query processing, Runtime, Search engines, Stochastic processes, Web pages

Customized semantic query answering, personalized search, focused crawlers and localized search engines frequently focus on ranking the pages contained within a subgraph of the global Web graph. The challenge for these applications is to compute PageRank-style scores efficiently on the subgraph, i.e., the ranking must reflect the global link structure of the Web graph but it must do so without paying the high overhead associated with a global computation. We propose a framework of an exact solution and an approximate solution for computing ranking on a subgraph. The IdealRank algorithm is an exact solution with the assumption that the scores of external pages are known. We prove that the IdealRank scores for pages in the subgraph converge. Since the PageRank-style scores of external pages may not typically be available, we propose the ApproxRank algorithm to estimate scores for the subgraph. Both IdealRank and ApproxRank represent the set of external pages with an external node L1 and extend the subgraph with links to L1. They also modify the PageRank-style transition matrix with respect to L1. We analyze the L1 distance between IdealRank scores and ApproxRank scores of the subgraph and show that it is within a constant factor of the L1 distance of the external pages (e.g., the true PageRank scores and uniform scores assumed by ApproxRank). We compare ApproxRank and a stochastic complementation approach (SC), a current best solution for this problem, on different types of subgraphs. ApproxRank has similar or superior performance to SC and typically improves on the runtime performance of SC by an order of magnitude or better. We demonstrate that ApproxRank provides a good approximation to PageRank for a variety of subgraphs.