LBSC 878
Information Retrieval Systems
Spring 2005
Information Retrieval Resources
Available Text Retrieval Systems
The following systems are available for use in this course. Those
with links can be downloaded freely. For access to other systems,
please check with the teaching assistant.
- Cheshire II
- Research software implementing a logistic regression model. It
is freely available by FTP from the University of California at
Berkeley. Gettig it working requires some facility with Z39.50.
- Glimpse
- Freely available software from the University of Arizona that
is designed for efficient indexing (at some cost in retrieval
efficiency). Glimpse is not configured for TREC-style
evaluations, so that will take some extra work.
- InQuery
- Commercial software based on inference networks that has a very
flexible query language. We have a research and teaching
license for this system from the University of Massachusetts,
and use it regularly. InQuery includes a fairly nice
X-Windows interface and it is configured to run TREC-style
evaluations, but the source code is not available.
- IRF
- A Java toolkit for building IR systems for small applications.
The strength of IRF is that the object oriented framework greatly
simplifies tasks that require working wiht the source code. Bt
because Java is designed for platform independence rather than
efficiency, the size of the collections that can be handled is quite
limited. Click
here to see the steps for installing IRF.
- Lucene
- A freely available Java IR system, probably the easiest system
to get up and running, and the most easily modified. This is
now so widely used that it is becoming a de facto standard.
- Lemur
- A toolkit for building language modeling systems for
information retrieval. The current version of Lemur, Indri, is
available on the same site.
- MG
- Research software from the Royal Melbourne Institute of
Technology that is designed to maximize storage efficiency on
very large collectons. It is available under the GNU public
license. We installed this several years ago and it
wasn't too difficult.
Click here to download the tarfile.
- PRISE
- Public domain vector space research software developed at NIST
We regularly use this system for TDT evaluations. PRISE
includes a very nice Z39.50 interface, but it takes some
facility with that stangard to get the interactive part
running. PRISE is configured to run TREC-style evaluations and
the source code is available.
- SMART
- A vector space research software that is freely available by FTP
from Cornell University. We have extensive experience using
SMART. SMART includes only a VT-100 interface, but it is
configured to run TREC-style evaluations and the source code is
available.
- Terrier
- From the marketing pitch: "Terrier is a modular platform for
the rapid development of large-scale Information Retrieval
applications."
- Xapian
- An open source IR system that is designed ot run under Linux.
Xiapan is a descendent of Omseek, which itself is a decendent
of Open Muscat. Xiapan is designed to handle several Western
European languages, and thus might be a good choice if you want to
work with languages other than English.
- Zettair
- The successor to MG from the search engine group at RMIT. Even
faster, and quite a bit easier to modify.
Sources of online IR papers
An Incomplete List of IR Research Groups Around the World
IR Resource Pages
Other Useful Resources
Doug Oard
Last modified: Jan Dec 18 20:41:04 2005