UMCP: CLIS: LBSC 796/INFM 718R: FALL 2007: Resources

LBSC 796/INFM 718R
Information Retrieval Systems
Fall 2007
Information Retrieval Resources

Available Text Retrieval Systems

The following systems are available for use in this course. Those with links can be downloaded freely and used anywhere. For access to other systems, please check with the teaching assistant. The three you are most likely to want to use are listed first, others are listed in alphabetical order for completeness.

URA: An integrated Java package that includes components from Lucene, Terrier, and the Stanford NLP tools.
Lucene: A freely available Java IR system, probably the easiest system to get up and running, and the most easily modified.
Indri: Indri is optimized for efficiency, and thus is probably the best choice if you have a very large collection. It is built on top of the Lemur toolkit for building language modeling systems for information retrieval.
Zettair: Zettair is optimized for both efficiency and modifiability. It therefore occupies a part of the design space between Lucene and Indri.
Cheshire 3: Freely available research software implementing a logistic regression model from the University of California at Berkeley. Gettig it working may require some facility with Z39.50.
Glimpse: Freely available software from the University of Arizona that is designed for efficient indexing (at some cost in retrieval efficiency). Glimpse is not configured for TREC-style evaluations, so that would take some extra work.
InQuery: Commercial software based on inference networks that has a very flexible query language. We have a research and teaching license for this system from the University of Massachusetts, and still use it occassionally. InQuery includes a fairly nice X-Windows interface and it is configured to run TREC-style evaluations, but the source code is not available.
IRF: A Java toolkit for building IR systems for small applications. The strength of IRF is that the object oriented framework greatly simplifies tasks that require working wiht the source code. Bt because Java is designed for platform independence rather than efficiency, the size of the collections that can be handled is quite limited.
MG: Research software from the Royal Melbourne Institute of Technology that is designed to maximize storage efficiency on very large collectons. It is available under the GNU public license. We installed this once several years ago and it wasn't too difficult. Click here to download the tarfile.
PRISE: Public domain vector space research software developed at NIST We regularly use this system for TDT evaluations. PRISE includes a very nice Z39.50 interface, but it takes some facility with that stangard to get the interactive part running. PRISE is configured to run TREC-style evaluations and the source code is available.
SMART: A vector space research software that is freely available by FTP from Cornell University. We have extensive experience using SMART. SMART includes only a VT-100 interface, but it is configured to run TREC-style evaluations and the source code is available.
Terrier: An information retrieval system from the University of Glasgow that is optimized for efficiency. Terrier implements the divergence from randomness framework for ranked retrieval.
Xapian: An open source IR system that is designed ot run under Linux. Xiapan is a descendent of Omseek, which itself is a decendent of Open Muscat. Xiapan is designed to handle several Western European languages, and thus might be a good choice if you want to work with languages other than English.

IR Books

Christopher D. Manning, Prabhakar Raghavan and Hinrich Schuetze, Introduction to Information Retrieval, Web Draft.
C. J. van Rijsbergen, Information Retrieval, Butterworths, London, 1979.
Other IR Books (and much more)

Online Sources for IR Papers

ACM Digital Library (available on campus and through ResearchPort)
- ACM SIGIR Conference Proceedings
- Joint Conference on Digital Libraries Proceedings
- ACM Multimedia Conference Proceedings
- ACM Transactions on Information Systems (a core information retrieval journal)
- Communications of the ACM (a journal)
Other Core Information Retrieval Journals (available through ResearchPort)
Information Retrieval Evaluation Venues (all with online proceedings)
- Text REtrieval Conference (TREC) (USA)
- NII Test Collection Information Retrieval Project (NTCIR) (Pacific Rim)
- Cross-Language Evaluation Forum (CLEF) (Europe)
- Forum for Information Retrieval Evaluation (FIRE) (India) [new, not much there yet.]
Technical Reports (all focused on the Computer Science side of the field)
- CiteSeer (with cached versions of the preprint versions for many papers. This is the predecessor to Google Scholar.)
- Computing Research Repository
- Cornell Computer Science Technical Reports (an excellent source for some of the best pre-TREC experimental work on IR)
- Maryland Computer Science Technical Reports (to see what other local folks have been up to)
Springer Lecture Notes in Computer Science (Many conference proceedings, available on campus and through ResearchPort. The final version of CLEF proceedings apper here about a year after each workshop.)
D-Lib Magazine

A Very Incomplete List of IR Research Groups

IR Resource Pages

Other Useful Resources

Federal STI Managers Group (CENDI)

Doug Oard

Last modified: Aug 18 2007

LBSC 796/INFM 718R Information Retrieval Systems Fall 2007 Information Retrieval Resources