%0 Conference Paper
%B IEEE 25th International Conference on Data Engineering, 2009. ICDE '09
%D 2009
%T ApproxRank: Estimating Rank for a Subgraph
%A Yao Wu
%A Raschid, Louiqa
%K Application software
%K ApproxRank
%K Computer applications
%K Crawlers
%K customized semantic query answering
%K Data engineering
%K Educational institutions
%K Explosions
%K focused crawlers
%K global Web graph
%K graph theory
%K IdealRank algorithm
%K Internet
%K localized search engines
%K PageRank-style
%K personalized search
%K Query processing
%K Runtime
%K Search engines
%K Stochastic processes
%K Web pages
%X Customized semantic query answering, personalized search, focused crawlers and localized search engines frequently focus on ranking the pages contained within a subgraph of the global Web graph. The challenge for these applications is to compute PageRank-style scores efficiently on the subgraph, i.e., the ranking must reflect the global link structure of the Web graph but it must do so without paying the high overhead associated with a global computation. We propose a framework of an exact solution and an approximate solution for computing ranking on a subgraph. The IdealRank algorithm is an exact solution with the assumption that the scores of external pages are known. We prove that the IdealRank scores for pages in the subgraph converge. Since the PageRank-style scores of external pages may not typically be available, we propose the ApproxRank algorithm to estimate scores for the subgraph. Both IdealRank and ApproxRank represent the set of external pages with an external node L1 and extend the subgraph with links to L1. They also modify the PageRank-style transition matrix with respect to L1. We analyze the L1 distance between IdealRank scores and ApproxRank scores of the subgraph and show that it is within a constant factor of the L1 distance of the external pages (e.g., the true PageRank scores and uniform scores assumed by ApproxRank). We compare ApproxRank and a stochastic complementation approach (SC), a current best solution for this problem, on different types of subgraphs. ApproxRank has similar or superior performance to SC and typically improves on the runtime performance of SC by an order of magnitude or better. We demonstrate that ApproxRank provides a good approximation to PageRank for a variety of subgraphs.
%B IEEE 25th International Conference on Data Engineering, 2009. ICDE '09
%I IEEE
%P 54 - 65
%8 2009/04/29/March
%@ 978-1-4244-3422-0
%G eng
%R 10.1109/ICDE.2009.108
%0 Conference Paper
%B IEEE 25th International Conference on Data Engineering, 2009. ICDE '09
%D 2009
%T Ef?cient Query Evaluation over Temporally Correlated Probabilistic Streams
%A Kanagal,B.
%A Deshpande, Amol
%K Birds
%K Computerized monitoring
%K correlated probabilistic streams
%K correlation structure
%K Data engineering
%K data mining
%K Databases
%K Event detection
%K Graphical models
%K inference mechanisms
%K Markov processes
%K polynomial time
%K probabilistic database
%K probabilistic graphical model
%K probabilistic query evaluation
%K query evaluation
%K query planning algorithm
%K query plans
%K Query processing
%K Random variables
%K stream processing operator
%K Streaming media
%X In this paper, we address the problem of efficient query evaluation over highly correlated probabilistic streams. We observe that although probabilistic streams tend to be strongly correlated in space and time, the correlations are usually quite structured (i.e., the same set of dependencies and independences repeat across time) and Markovian (i.e., the state at time "t+1" is independent of the states at previous times given the state at time "t"). We exploit this observation to compactly encode probabilistic streams by decoupling the correlation structure (the set of dependencies) from the actual probability values. We develop novel stream processing operators that can efficiently and incrementally process new data items; our operators are based on the previously proposed framework of viewing probabilistic query evaluation as inference over probabilistic graphical models (PGMs) [P. Sen and A. Deshpande, 2007]. We develop a query planning algorithm that constructs efficient query plans that are executable in polynomial-time whenever possible, and we characterize queries for which such plans are not possible. Finally we conduct an extensive experimental evaluation that illustrates the advantages of exploiting the structured nature of correlations in probabilistic streams.
%B IEEE 25th International Conference on Data Engineering, 2009. ICDE '09
%I IEEE
%P 1315 - 1318
%8 2009/04/29/March
%@ 978-1-4244-3422-0
%G eng
%R 10.1109/ICDE.2009.229
%0 Conference Paper
%B IEEE 25th International Conference on Data Engineering, 2009. ICDE '09
%D 2009
%T Minimizing Communication Cost in Distributed Multi-query Processing
%A Li,Jian
%A Deshpande, Amol
%A Khuller, Samir
%K Approximation algorithms
%K Communication networks
%K Computer science
%K Cost function
%K Data engineering
%K distributed communication network
%K distributed databases
%K distributed multi-query processing
%K grid computing
%K Large-scale systems
%K NP-hard
%K optimisation
%K Polynomials
%K Publish-subscribe
%K publish-subscribe systems
%K Query optimization
%K Query processing
%K sensor networks
%K Steiner tree problem
%K Tree graphs
%K trees (mathematics)
%X Increasing prevalence of large-scale distributed monitoring and computing environments such as sensor networks, scientific federations, Grids etc., has led to a renewed interest in the area of distributed query processing and optimization. In this paper we address a general, distributed multi-query processing problem motivated by the need to minimize the communication cost in these environments. Specifically we address the problem of optimally sharing data movement across the communication edges in a distributed communication network given a set of overlapping queries and query plans for them (specifying the operations to be executed). Most of the problem variations of our general problem can be shown to be NP-Hard by a reduction from the Steiner tree problem. However, we show that the problem can be solved optimally if the communication network is a tree, and present a novel algorithm for finding an optimal data movement plan. For general communication networks, we present efficient approximation algorithms for several variations of the problem. Finally, we present an experimental study over synthetic datasets showing both the need for exploiting the sharing of data movement and the effectiveness of our algorithms at finding such plans.
%B IEEE 25th International Conference on Data Engineering, 2009. ICDE '09
%I IEEE
%P 772 - 783
%8 2009/04/29/March
%@ 978-1-4244-3422-0
%G eng
%R 10.1109/ICDE.2009.85
%0 Conference Paper
%B IEEE 25th International Conference on Data Engineering, 2009. ICDE '09
%D 2009
%T Web Monitoring 2.0: Crossing Streams to Satisfy Complex Data Needs
%A Roitman,H.
%A Gal,A.
%A Raschid, Louiqa
%K Bandwidth
%K complex client information need
%K Data Delivery
%K Data engineering
%K database management systems
%K Educational institutions
%K Internet
%K Mashups
%K mashups generation
%K Monitoring
%K multiple information source
%K offline algorithmic solution
%K Portals
%K PROBES
%K Profiles
%K Query processing
%K scalability
%K scheduling
%K volatile information stream
%K Web 2.0
%K Web Monitoring
%X Web monitoring 2.0 supports the complex information needs of clients who probe multiple information sources and generate mashups by integrating across these volatile streams. A proxy that aims at satisfying multiple customized client profiles will face a scalability challenge in trying to maximize the number of clients served while at the same time fully satisfying complex client needs. In this paper, we introduce an abstraction of complex execution intervals, a combination of time intervals and information streams, to capture complex client needs. Given some budgetary constraints (e.g., bandwidth), we present offline algorithmic solutions for the problem of maximizing completeness of capturing complex profiles.
%B IEEE 25th International Conference on Data Engineering, 2009. ICDE '09
%I IEEE
%P 1215 - 1218
%8 2009/04/29/March
%@ 978-1-4244-3422-0
%G eng
%R 10.1109/ICDE.2009.204