The purpose of this exercise is to gain some "hands-on" experience in the process of evaluating information retrieval systems. You will be assessing documents that are retrieved in response to two "topics" (statements of information needs). The two search engines we'll be comparing are Google and Bing.
gardening wet soil conditions
What special
considerations must be made when planting a garden in very wet soil
conditions? Are there plants that will not work and/or that will work
particularly well? Are there any special techniques that will help
reduce the amount of moisture? Only pages that deal with gardening
are relevant.
[Google results]
[Bing results page 1]
[Bing results page 2]
[Bing results page 3]
oil vs. propane furnace
Looking for pages that list
the tradeoffs for using a propane furnace rather than a fuel oil
furnace. The pages should provide comparison and are only relevant if
they talk about both. They can talk about other types of heating
(e.g., electric), provided they talk about oil and propane. Propane is
also called LP or LPG (liquid propane [gas]).
[Google results]
[Bing results page 1]
[Bing results page 2]
[Bing results page 3]
To ensure that everyone evaluates the same hits, results from each search engine have been cached for you; follow the above links You will evaluate the relevance of a subset of these hits. If your social security number ends in an even digit, evaluate the first 15 hits of each query. If you social security number ends in an odd digit, evaluate the last 15 hits of each query. All Google results are on one page; the Bing results are spread across three pages (10 per page). (do not follow the next page links at the bottom of a page -- come back here to get to the next page on Bing). Query 1 has some sponsored ads are present at the top and/or bottom of the page, they should not be counted. Sponsored ads on the side of the page, and additional links that a search engine shows indented below a primary link should also be ignored. See the URL's in the spreadsheet if you have any questions about which link is being counted as being in each position.
Use this Excel spreadsheet to keep track of your relevance judgments. In the column marked "Relevance", enter "1" (one) if you think the document is relevant. Enter "0" (zero) if you think the document is not relevant. Add the spreadsheet to your homework Web page. Change the name of the spreadsheet to be your last name (.xls) so that we don't wind up with a dozen files with the same filename!
This assignment was adapted from James Allan's CMPSCI 646 course (Fall 2004) at the University of Mass.