Exploiting Correlated Attributes in Acquisitional Query Processing

TitleExploiting Correlated Attributes in Acquisitional Query Processing
Publication TypeConference Papers
Year of Publication2005
AuthorsDeshpande A, Guestrin C, Hong W, Madden S
Conference Name21st International Conference on Data Engineering, 2005. ICDE 2005. Proceedings
Date Published2005/04/05/08
ISBN Number0-7695-2285-8
Keywordsacquisitional query processing, Computer networks, Costs, data acquisition, Delay, Distributed computing, distributed information system, Distributed information systems, distributed processing, exponential time algorithm, optimization techniques, polynomial-time heuristic, Polynomials, probability, Query processing, real-time systems, real-world sensor-network, Runtime, Sensor phenomena and characterization, Sensor systems

Sensor networks and other distributed information systems (such as the Web) must frequently access data that has a high per-attribute acquisition cost, in terms of energy, latency, or computational resources. When executing queries that contain several predicates over such expensive attributes, we observe that it can be beneficial to use correlations to automatically introduce low-cost attributes whose observation will allow the query processor to better estimate the selectivity of these expensive predicates. In particular, we show how to build conditional plans that branch into one or more sub-plans, each with a different ordering for the expensive query predicates, based on the runtime observation of low-cost attributes. We frame the problem of constructing the optimal conditional plan for a given user query and set of candidate low-cost attributes as an optimization problem. We describe an exponential time algorithm for finding such optimal plans, and describe a polynomial-time heuristic for identifying conditional plans that perform well in practice. We also show how to compactly model conditional probability distributions needed to identify correlations and build these plans. We evaluate our algorithms against several real-world sensor-network data sets, showing several-times performance increases for a variety of queries versus traditional optimization techniques.