[CLIS logo]

LBSC 878
Information Retrieval Systems
Spring 2005
Information Retrieval Resources


Available Text Retrieval Systems

The following systems are available for use in this course. Those with links can be downloaded freely. For access to other systems, please check with the teaching assistant.
Cheshire II
Research software implementing a logistic regression model. It is freely available by FTP from the University of California at Berkeley. Gettig it working requires some facility with Z39.50.
Glimpse
Freely available software from the University of Arizona that is designed for efficient indexing (at some cost in retrieval efficiency). Glimpse is not configured for TREC-style evaluations, so that will take some extra work.
InQuery
Commercial software based on inference networks that has a very flexible query language. We have a research and teaching license for this system from the University of Massachusetts, and use it regularly. InQuery includes a fairly nice X-Windows interface and it is configured to run TREC-style evaluations, but the source code is not available.
IRF
A Java toolkit for building IR systems for small applications. The strength of IRF is that the object oriented framework greatly simplifies tasks that require working wiht the source code. Bt because Java is designed for platform independence rather than efficiency, the size of the collections that can be handled is quite limited. Click here to see the steps for installing IRF.
Lucene
A freely available Java IR system, probably the easiest system to get up and running, and the most easily modified. This is now so widely used that it is becoming a de facto standard.
Lemur
A toolkit for building language modeling systems for information retrieval. The current version of Lemur, Indri, is available on the same site.
MG
Research software from the Royal Melbourne Institute of Technology that is designed to maximize storage efficiency on very large collectons. It is available under the GNU public license. We installed this several years ago and it wasn't too difficult. Click here to download the tarfile.
PRISE
Public domain vector space research software developed at NIST We regularly use this system for TDT evaluations. PRISE includes a very nice Z39.50 interface, but it takes some facility with that stangard to get the interactive part running. PRISE is configured to run TREC-style evaluations and the source code is available.
SMART
A vector space research software that is freely available by FTP from Cornell University. We have extensive experience using SMART. SMART includes only a VT-100 interface, but it is configured to run TREC-style evaluations and the source code is available.
Terrier
From the marketing pitch: "Terrier is a modular platform for the rapid development of large-scale Information Retrieval applications."
Xapian
An open source IR system that is designed ot run under Linux. Xiapan is a descendent of Omseek, which itself is a decendent of Open Muscat. Xiapan is designed to handle several Western European languages, and thus might be a good choice if you want to work with languages other than English.
Zettair
The successor to MG from the search engine group at RMIT. Even faster, and quite a bit easier to modify.

Sources of online IR papers

An Incomplete List of IR Research Groups Around the World

IR Resource Pages

Other Useful Resources


Doug Oard
Last modified: Jan Dec 18 20:41:04 2005