TY  - CONF
T1  - ApproxRank: Estimating Rank for a Subgraph
T2  - IEEE 25th International Conference on Data Engineering, 2009. ICDE '09
Y1  - 2009
A1  - Yao Wu
A1  - Raschid, Louiqa
KW  - Application software
KW  - ApproxRank
KW  - Computer applications
KW  - Crawlers
KW  - customized semantic query answering
KW  - Data engineering
KW  - Educational institutions
KW  - Explosions
KW  - focused crawlers
KW  - global Web graph
KW  - graph theory
KW  - IdealRank algorithm
KW  - Internet
KW  - localized search engines
KW  - PageRank-style
KW  - personalized search
KW  - Query processing
KW  - Runtime
KW  - Search engines
KW  - Stochastic processes
KW  - Web pages
AB  - Customized semantic query answering, personalized search, focused crawlers and localized search engines frequently focus on ranking the pages contained within a subgraph of the global Web graph. The challenge for these applications is to compute PageRank-style scores efficiently on the subgraph, i.e., the ranking must reflect the global link structure of the Web graph but it must do so without paying the high overhead associated with a global computation. We propose a framework of an exact solution and an approximate solution for computing ranking on a subgraph. The IdealRank algorithm is an exact solution with the assumption that the scores of external pages are known. We prove that the IdealRank scores for pages in the subgraph converge. Since the PageRank-style scores of external pages may not typically be available, we propose the ApproxRank algorithm to estimate scores for the subgraph. Both IdealRank and ApproxRank represent the set of external pages with an external node L1 and extend the subgraph with links to L1. They also modify the PageRank-style transition matrix with respect to L1. We analyze the L1 distance between IdealRank scores and ApproxRank scores of the subgraph and show that it is within a constant factor of the L1 distance of the external pages (e.g., the true PageRank scores and uniform scores assumed by ApproxRank). We compare ApproxRank and a stochastic complementation approach (SC), a current best solution for this problem, on different types of subgraphs. ApproxRank has similar or superior performance to SC and typically improves on the runtime performance of SC by an order of magnitude or better. We demonstrate that ApproxRank provides a good approximation to PageRank for a variety of subgraphs.
JA  - IEEE 25th International Conference on Data Engineering, 2009. ICDE '09
PB  - IEEE
SN  - 978-1-4244-3422-0
M3  - 10.1109/ICDE.2009.108
ER  - 

TY  - CONF
T1  - Ef?cient Query Evaluation over Temporally Correlated Probabilistic Streams
T2  - IEEE 25th International Conference on Data Engineering, 2009. ICDE '09
Y1  - 2009
A1  - Kanagal,B.
A1  - Deshpande, Amol
KW  - Birds
KW  - Computerized monitoring
KW  - correlated probabilistic streams
KW  - correlation structure
KW  - Data engineering
KW  - data mining
KW  - Databases
KW  - Event detection
KW  - Graphical models
KW  - inference mechanisms
KW  - Markov processes
KW  - polynomial time
KW  - probabilistic database
KW  - probabilistic graphical model
KW  - probabilistic query evaluation
KW  - query evaluation
KW  - query planning algorithm
KW  - query plans
KW  - Query processing
KW  - Random variables
KW  - stream processing operator
KW  - Streaming media
AB  - In this paper, we address the problem of efficient query evaluation over highly correlated probabilistic streams. We observe that although probabilistic streams tend to be strongly correlated in space and time, the correlations are usually quite structured (i.e., the same set of dependencies and independences repeat across time) and Markovian (i.e., the state at time "t+1" is independent of the states at previous times given the state at time "t"). We exploit this observation to compactly encode probabilistic streams by decoupling the correlation structure (the set of dependencies) from the actual probability values. We develop novel stream processing operators that can efficiently and incrementally process new data items; our operators are based on the previously proposed framework of viewing probabilistic query evaluation as inference over probabilistic graphical models (PGMs) [P. Sen and A. Deshpande, 2007]. We develop a query planning algorithm that constructs efficient query plans that are executable in polynomial-time whenever possible, and we characterize queries for which such plans are not possible. Finally we conduct an extensive experimental evaluation that illustrates the advantages of exploiting the structured nature of correlations in probabilistic streams.
JA  - IEEE 25th International Conference on Data Engineering, 2009. ICDE '09
PB  - IEEE
SN  - 978-1-4244-3422-0
M3  - 10.1109/ICDE.2009.229
ER  - 

TY  - CONF
T1  - Minimizing Communication Cost in Distributed Multi-query Processing
T2  - IEEE 25th International Conference on Data Engineering, 2009. ICDE '09
Y1  - 2009
A1  - Li,Jian
A1  - Deshpande, Amol
A1  - Khuller, Samir
KW  - Approximation algorithms
KW  - Communication networks
KW  - Computer science
KW  - Cost function
KW  - Data engineering
KW  - distributed communication network
KW  - distributed databases
KW  - distributed multi-query processing
KW  - grid computing
KW  - Large-scale systems
KW  - NP-hard
KW  - optimisation
KW  - Polynomials
KW  - Publish-subscribe
KW  - publish-subscribe systems
KW  - Query optimization
KW  - Query processing
KW  - sensor networks
KW  - Steiner tree problem
KW  - Tree graphs
KW  - trees (mathematics)
AB  - Increasing prevalence of large-scale distributed monitoring and computing environments such as sensor networks, scientific federations, Grids etc., has led to a renewed interest in the area of distributed query processing and optimization. In this paper we address a general, distributed multi-query processing problem motivated by the need to minimize the communication cost in these environments. Specifically we address the problem of optimally sharing data movement across the communication edges in a distributed communication network given a set of overlapping queries and query plans for them (specifying the operations to be executed). Most of the problem variations of our general problem can be shown to be NP-Hard by a reduction from the Steiner tree problem. However, we show that the problem can be solved optimally if the communication network is a tree, and present a novel algorithm for finding an optimal data movement plan. For general communication networks, we present efficient approximation algorithms for several variations of the problem. Finally, we present an experimental study over synthetic datasets showing both the need for exploiting the sharing of data movement and the effectiveness of our algorithms at finding such plans.
JA  - IEEE 25th International Conference on Data Engineering, 2009. ICDE '09
PB  - IEEE
SN  - 978-1-4244-3422-0
M3  - 10.1109/ICDE.2009.85
ER  - 

TY  - CONF
T1  - Web Monitoring 2.0: Crossing Streams to Satisfy Complex Data Needs
T2  - IEEE 25th International Conference on Data Engineering, 2009. ICDE '09
Y1  - 2009
A1  - Roitman,H.
A1  - Gal,A.
A1  - Raschid, Louiqa
KW  - Bandwidth
KW  - complex client information need
KW  - Data Delivery
KW  - Data engineering
KW  - database management systems
KW  - Educational institutions
KW  - Internet
KW  - Mashups
KW  - mashups generation
KW  - Monitoring
KW  - multiple information source
KW  - offline algorithmic solution
KW  - Portals
KW  - PROBES
KW  - Profiles
KW  - Query processing
KW  - scalability
KW  - scheduling
KW  - volatile information stream
KW  - Web 2.0
KW  - Web Monitoring
AB  - Web monitoring 2.0 supports the complex information needs of clients who probe multiple information sources and generate mashups by integrating across these volatile streams. A proxy that aims at satisfying multiple customized client profiles will face a scalability challenge in trying to maximize the number of clients served while at the same time fully satisfying complex client needs. In this paper, we introduce an abstraction of complex execution intervals, a combination of time intervals and information streams, to capture complex client needs. Given some budgetary constraints (e.g., bandwidth), we present offline algorithmic solutions for the problem of maximizing completeness of capturing complex profiles.
JA  - IEEE 25th International Conference on Data Engineering, 2009. ICDE '09
PB  - IEEE
SN  - 978-1-4244-3422-0
M3  - 10.1109/ICDE.2009.204
ER  -