Downloads

I've collected here various useful bits of code that I've hacked up over the years and other resources. They are all released under the GNU Public License. As usual, there are no warranties of any sort associated with these packages, so use at your own risk, and of course, your mileage may vary. Otherwise, enjoy!

Raw Nugget Pyramids Data

Released: April 13, 2006
Last update: September 9, 2006

Raw data for the experiments described in: Jimmy Lin and Dina Demner-Fushman. Will Pyramids Built of Nuggets Topple Over? Proceedings of the 2006 Human Language Technology Conference and the North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT/NAACL 2006), page 383-390, June 2006, New York City, New York.

Download: nugget-pyramids.tar.gz (646k)
Download: combine_judgments.pl (Perl script for building nugget pyramids)

Pourpre scoring script for automatically evaluating complex questions

Download: pourpre-1.1.tar.gz: Release 06/13/2007 (404k) [README]
Older version: pourpre-1.0.tar.gz: Release 05/29/2005 (376k) [README]

Relevant publications:

URA

Released: September 14, 2006, updated: October 10, 2006

This is my Java "kitchen-sink" standoff annotation architecture that integrates a variety of IR and NLP packages, notably Lucene, Terrier, and the Stanford NLP tools.

Download: ura-v1.01.tar.gz (23962k)

The Aranea question answering system

Released: June 11, 2005

Aranea is a Web-based factoid question answering system that uses a combination of data redundancy and database techniques. Its performance in TREC 2002, TREC 2003, and TREC 2004 was competitive. The predecessor to Aranea is the askMSR system that colleagues at Microsoft Research and I developed in 2001. Details:

Jimmy Lin. An Exploration of the Principles Underlying Redundancy-Based Factoid Question Answering. ACM Transactions on Information Systems, 27(2):1-55, 2007.

Download: Aranea-r1.00.tar.gz (52221k)

QA test collection

Released: June 9, 2005

The question answering test collection as descibed in: Jimmy Lin and Boris Katz. Building a Reusable Test Collection for Question Answering. Journal of the American Society for Information Science and Technology, 57(7):851-861, 2006.

Download: qa-test-collection.tar.gz (32k)

Java version of Brill's Part-of-Speech Tagger

Released: December 27, 2004

Eric Brill's part-of-speech tagger ported to Java via the Java Native Interface (JNI). In actuality, it's based on Benjamin Han's ePost package, which is a cleaned-up version of Brill's original code. Has been tested on both Linux and Windows (under Cygwin).

Documentation: javadoc
Download: brill-java-1.0.tar.gz (9352 KB)

LPost: Perl version of Brill's Part-of-Speech Tagger

Released: December 27, 2004

Eric Brill's part-of-speech tagger as a Perl Module. Just like the Java version, it's based on Benjamin Han's ePost package. Has been tested on both Linux and Windows (under Cygwin with ActiveState Perl).

Documentation: LPost POD
Download: LPost-1.0.tar.gz (593 KB)


Back to main page

Valid XHTML 1.0! Valid CSS!