LBSC 796/INFM 718R: Homework 2

Gathering Relevance Judgments

Last update: January 24, 2011.

The purpose of this exercise is to gain some "hands-on" experience in the process of evaluating information retrieval systems. You will be assessing documents that are retrieved in response to two "topics" (statements of information needs). The two search engines we'll be comparing are Google and Bing.

  1. gardening wet soil conditions
    What special considerations must be made when planting a garden in very wet soil conditions? Are there plants that will not work and/or that will work particularly well? Are there any special techniques that will help reduce the amount of moisture? Only pages that deal with gardening are relevant.
    [Google results] [Bing results page 1] [Bing results page 2] [Bing results page 3]

  2. oil vs. propane furnace
    Looking for pages that list the tradeoffs for using a propane furnace rather than a fuel oil furnace. The pages should provide comparison and are only relevant if they talk about both. They can talk about other types of heating (e.g., electric), provided they talk about oil and propane. Propane is also called LP or LPG (liquid propane [gas]).
    [Google results] [Bing results page 1] [Bing results page 2] [Bing results page 3]

To ensure that everyone evaluates the same hits, results from each search engine have been cached for you; follow the above links You will evaluate the relevance of a subset of these hits. If your social security number ends in an even digit, evaluate the first 15 hits of each query. If you social security number ends in an odd digit, evaluate the last 15 hits of each query. All Google results are on one page; the Bing results are spread across three pages (10 per page). (do not follow the next page links at the bottom of a page -- come back here to get to the next page on Bing). Query 1 has some sponsored ads are present at the top and/or bottom of the page, they should not be counted. Sponsored ads on the side of the page, and additional links that a search engine shows indented below a primary link should also be ignored. See the URL's in the spreadsheet if you have any questions about which link is being counted as being in each position.

Use this Excel spreadsheet to keep track of your relevance judgments. In the column marked "Relevance", enter "1" (one) if you think the document is relevant. Enter "0" (zero) if you think the document is not relevant. Add the spreadsheet to your homework Web page. Change the name of the spreadsheet to be your last name (.xls) so that we don't wind up with a dozen files with the same filename!

This assignment was adapted from James Allan's CMPSCI 646 course (Fall 2004) at the University of Mass.