University of Maryland
Web Query Optimization Project

Architectures of wrappers and mediators aim to provide seemless access to data stored in a wide variety of repositories including Web accessible WebSources (enabled by HTTP, XML, HTML) in a wide-area environment. Web query optimization addresses the task of query planning and source selection for wide area environments and WebSources of limited capability. A query must be accepted without exact knowledge of the available sources. Heterogeneity implies these sources vary widely in their processing capability and cost, thus, complicating this decision. With 100's of sources, we can also expect replication of contents. Replication should be exploited to provide least-cost answers, in a dynamic environment of (un)available sources.
Mediators have been developed as an extension of the Predator object-relational database system. We have developed a Web Query Optimizer (WQO) within the mediator. The WQO has two components. The first is a CBR (capability based rewriting) Tool and the second is an enhanced randomized relational optimizer. In a pre-optimization phase, the CBR Tool produces (multiple) pre-plan(s) for a mediator query. A pre-plan consists of (possibly ordered) subgoals to be executed in the WebSources and the mediator. The pre-plan identifies one (or more) relevant WebSource Implementations WSI (wrapper calls) for a mediator subgoal, as well as restrictions and orderings imposed by the WebSource capabilities. The WQO uses the pre-plan to drive the relational optimizer. The WQO first chooses a "good" WSI. During optimization, subgoal orderings and subgoal restrictions identified in the pre-plan are provided to the relational optimizer, and it respects them while producing a good plan for the subgoals in the query.
A WebWrapper cost model provides a number of metrics that can be used by the WQO in choosing a good WSI and in choosing a good plan. These metrics are obtained using query feedback, since WebSources typically are autonomous and do not provide either access costs or statistics. We have developed a WebPT - a tool that can be used to learn from query feedback and predict the response time for accessing a WebSource across a wide area network. A prototype of the Web Query Optimizer has been implemented and it has been tested against a number of WebSources including the ACM Digital Library.

Members of the Dynamic Query Optimization Project

Recent papers

A paper that describes the Web Query Optimizer Efficient Evaluation of Queries in a Mediator for WebSources will appear in the Sigmod 2002 Proceedings.
A related paper on Query Optimization to Meet Performance Targets for Wide Area Applications will appear in the ICDCS 2002 Proceedings.
Please for more recent unpublished papers on the Web Query Optimizer.
An overview of our research is in this KEYNOTE presentation ps or pdf presented at the 1999 Russian National Conference on Digital Libraries, St. Petersburg, October 1999.

We have constructed a tool - WebPT - to predict response times from Web accessible sources.
Details on the tool is available here as ps or pdf.
This paper in ps or pdf describes a comparison of the WebPT tool with a Neural Network.

A Meta-Wrapper for Scaling up to Multiple Autonomous Distributed Information Sources ps appeared in the CoopIS 1998 Proceedings. A longer journal version is here.

Optimization of Wrappers and Mediators for Web Accessible Data Sources (WebSources) to be presented at the CIKM'98 Workshop on Web Information and Data Management (WIDM'98)

A Report from a 1996 Workshop on Mediator Models

A Proposal on Scaling I3 Technology to 100's of Heterogeneous Sources

Scaling Heterogeneous Databases and the Design of DISCO.
Tomasic, Anthony and Raschid, Louiqa and Valduriez, Patrick. Proceedings of the International Conference on Distributed Computer Systems, 1996. Nominated for Best Paper Award. A long version appears in IEEE Transactions on Knowledge and Data Engineering, Volume 10, Number 4, July 1998.