I've collected here various useful bits of code that I've hacked up over the years and other resources. They are all released under the GNU Public License. As usual, there are no warranties of any sort associated with these packages, so use at your own risk, and of course, your mileage may vary. Otherwise, enjoy!
Raw Nugget Pyramids Data
Released: April 13, 2006
Last update: September 9, 2006Raw data for the experiments described in: Jimmy Lin and Dina Demner-Fushman. Will Pyramids Built of Nuggets Topple Over? Proceedings of the 2006 Human Language Technology Conference and the North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT/NAACL 2006), page 383-390, June 2006, New York City, New York.
Download:
nugget-pyramids.tar.gz(646k)
Download:combine_judgments.pl(Perl script for building nugget pyramids)
Pourpre scoring script for automatically evaluating complex questions
Download:
pourpre-1.1.tar.gz: Release 06/13/2007 (404k) [README]
Older version:pourpre-1.0.tar.gz: Release 05/29/2005 (376k) [README]Relevant publications:
- Jimmy Lin and Dina Demner-Fushman. Methods for Automatically Evaluating Answers to Complex Questions. Information Retrieval, 9(5):565-587, 2006. [DOI:10.1007/s10791-006-9003-7]
- Jimmy Lin and Dina Demner-Fushman. Automatically Evaluating Answers to Definition Questions. Proceedings of the 2005 Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP 2005), pages 931-938, October 2005, Vancouver, Canada.
URA
Released: September 14, 2006, updated: October 10, 2006
This is my Java "kitchen-sink" standoff annotation architecture that integrates a variety of IR and NLP packages, notably Lucene, Terrier, and the Stanford NLP tools.
Download:
ura-v1.01.tar.gz(23962k)
The Aranea question answering system
Released: June 11, 2005
Aranea is a Web-based factoid question answering system that uses a combination of data redundancy and database techniques. Its performance in TREC 2002, TREC 2003, and TREC 2004 was competitive. The predecessor to Aranea is the askMSR system that colleagues at Microsoft Research and I developed in 2001. Details:
Jimmy Lin. An Exploration of the Principles Underlying Redundancy-Based Factoid Question Answering. ACM Transactions on Information Systems, 27(2):1-55, 2007.
Download:
Aranea-r1.00.tar.gz(52221k)
QA test collection
Released: June 9, 2005
The question answering test collection as descibed in: Jimmy Lin and Boris Katz. Building a Reusable Test Collection for Question Answering. Journal of the American Society for Information Science and Technology, 57(7):851-861, 2006.
Download:
qa-test-collection.tar.gz(32k)
Java version of Brill's Part-of-Speech Tagger
Released: December 27, 2004
Eric Brill's part-of-speech tagger ported to Java via the Java Native Interface (JNI). In actuality, it's based on Benjamin Han's ePost package, which is a cleaned-up version of Brill's original code. Has been tested on both Linux and Windows (under Cygwin).
Documentation: javadoc
Download: brill-java-1.0.tar.gz (9352 KB)
LPost: Perl version of Brill's Part-of-Speech Tagger
Released: December 27, 2004
Eric Brill's part-of-speech tagger as a Perl Module. Just like the Java version, it's based on Benjamin Han's ePost package. Has been tested on both Linux and Windows (under Cygwin with ActiveState Perl).
Documentation: LPost POD
Download: LPost-1.0.tar.gz (593 KB)
Back to main page