Architectures of wrappers and mediators aim to provide seemless access to
data stored in a wide variety of repositories including Web accessible
WebSources (enabled by HTTP, XML, HTML) in a wide-area environment.
Web query optimization addresses the task of query planning and
source selection for wide area environments and
WebSources of limited capability.
A query must be accepted without exact knowledge of
the available sources. Heterogeneity implies these sources vary widely in
their processing capability and cost, thus, complicating this decision.
With 100's of sources, we can also expect replication of contents.
Replication should be exploited to provide least-cost answers, in a
dynamic environment of (un)available sources.
Mediators have been developed as an extension of the Predator
object-relational database system.
We have developed a Web Query Optimizer (WQO) within the mediator.
The WQO has two components.
The first is a CBR (capability based rewriting) Tool and the second is
an enhanced randomized relational optimizer.
In a pre-optimization phase, the CBR Tool produces (multiple) pre-plan(s)
for a mediator query. A pre-plan consists of (possibly ordered) subgoals
to be executed in the WebSources and the mediator. The pre-plan
identifies one (or more) relevant WebSource Implementations WSI
(wrapper calls) for a mediator subgoal, as well as restrictions and
orderings imposed by the WebSource capabilities. The WQO uses the
pre-plan to drive the relational optimizer. The WQO first chooses
a "good" WSI. During optimization, subgoal orderings and subgoal
restrictions identified in the pre-plan are provided to the relational
optimizer, and it respects them while producing a good plan for the
subgoals in the query.
A WebWrapper cost model provides a number of metrics that can be used
by the WQO in choosing a good WSI and in choosing a good plan. These
metrics are obtained using query feedback, since WebSources typically
are autonomous and do not provide either access costs or statistics.
We have developed a WebPT - a tool that can be used to learn from query
feedback and predict the response time for accessing a WebSource across
a wide area network.
A prototype of the Web Query Optimizer has been implemented and it
has been tested against a number of WebSources including the
ACM Digital Library.
for more recent unpublished
papers on the
Web Query Optimizer.