Handling estimation errors in database query processing

TitleHandling estimation errors in database query processing
Publication TypeTheses
Year of Publication2004
AuthorsDeshpande A
Date Published2004///
UniversityUniversity of California at Berkeley
CityBerkeley, CA, USA

A query optimizer is among the core pieces of a modern database management system, responsible for choosing a query execution plan for a user-provided declarative query. Query optimizers typically employ a cost estimation procedure to compare costs of different execution plans. Errors in this estimation process are quite common and arise due to reasons such as incomplete and insufficient statistical information about the data, and highly variable runtime environments that can affect the plan costs in unpredictable manners. Two approaches have been previously proposed for handling such estimation errors: (1) building sophisticated synopsis techniques that succintly summarize the data in the database and thus provide more statistical information to the query optimizer, and (2) aggressive reoptimization schemes that attempt to change the execution plans chosen to execute queries, on-the-fly. In the first part of this dissertation, we focus on building and using sophisticated synopsis techniques in the context of a traditional query optimizer. We propose a class of synopsis techniques called DEPENDENCY-B ASED Histograms that use statistical interaction models to exploit the correlations in the data, and to estimate selectivities efficiently. We also develop an efficient algorithm to search through the class of statistical models that we employ. Using sophisticated synopsis techniques such as these in the context of a traditional query optimizer poses interesting computational challenges; a naive approach to doing this could make the query optimization process so expensive as to be ineffective. This naturally leads to an “estimation planning” problem that asks for the best strategy to compute all the estimates required by an optimizer using the synopses at its disposal. We analyze this problem, its solution space, and propose algorithms to efficiently find good estimation plans. There are many scenarios where sophisticated synopsis techniques may not be applicable; examples include wide area and web based data sources, data streams and complex data domains. In the second part of this dissertation, we explore a highly-adaptive query processing technique called eddies that treats query processing as routing of tuples through operators, and adapts to changing data and runtime characteristics by continuously changing the order in which tuples are routed. We analyze the eddies architecture and identify a fundamental flaw in the basic design of the architecture: the burden of history in routing. (Abstract shortened by UMI.)