
I am a Ph.D. candidate in the Department of Computer Science at University of Maryland, College Park. I am also a graduate research assistant in the Computational Linguistics and Information Processing Laboratory at the Institute for Advanced Computer Studies, where I work with my advisor, Bonnie Dorr.
In general, my research is focused on building computational models of human language that can be validated by their contribution to enhancing human understanding of and experience with language. Specifically, my dissertation is based on exploring the intersection of and interaction between statistical machine translation and automatic paraphrase generation to build a computational model of paraphrasing. More details can be found in my research statement.
I am also particularly interested in Computer Science education. I have had a unique set of experiences in teaching computer science that span both my undergraduate and graduate careers. More details can be found in my teaching statement.
I will be graduating in Spring 2010 and am looking for an academic position that focuses on both CS teaching and research.
| Generating Phrasal & Sentential Paraphrases: A Survey of Data-Driven Methods (Journal Article). In submission to Computational Linguistics. Second round of review (Oct 2009). |
| Machine Translation Evaluation and Optimization (Book Chapter). In Preparation. |
| A Pythonic Exploration of Vector-Space Methods for Semantic Similarity (Magazine Article). In Preparation. |
| Active Learning for Mention Detection: A Comparison of Sentence Selection Strategies. (Available as arXiv:0911.1965v1 from the arXiv Computing Research Repository (CoRR)) |
|
| Emily: A Tool for Visual Poetry Analysis. |
| Using Paraphrases for Parameter Tuning in Statistical Machine Translation. 2007. Annual Technical Presentation at the Technical Meeting for Global Autonomous Language Exploitation. Nitin Madnani, Necip Fazil Ayan, Philip Resnik and Bonnie J. Dorr. |
| Expectation Maximization. 2004. Advanced NLP Seminar, University of Maryland. |
|
| Decoding in Statistical Machine Translation. 2006. StatMT Reading Group, University of Maryland. |
|
| A timeline of inter-annotator agreement measures in Computational Linguistics based on Inter-Coder Agreement for Computational Linguistics by Ron Artstein and Massimo Poesio. Linguistics seminar on Corpus-based Social Science, University of Maryland. |
| Python & Perl wrappers for SRILMs: Wrappers that will allow you to read and query an SRI language model directly in your Python and Perl code. | |
| clusterinfo: A Python script that displays current usage of a PBS-based cluster in a more condensed and easier-to-read format. | |
| LM Server: A Python-based XML-RPC server for an SRILM language model. Allows multiple clients to query the same language model that's loaded in memory in server mode. | |
| UMIACS Word Alignment Interface: A Java-based tool for creating and viewing word alignments between language pairs. It has been widely used across the community to create aligments for many language pairs including Welsh-English, Swahili-English, Czech-English and Chinese-English. | |
| TER-plus (TERp): TERp is an automatic evaluation metric for Machine Translation, which takes as input a set of reference translations, and a set of machine translation output for that same data. TERp utilizes automatically generated paraphrases, stemming, synonyms, relaxed shifting constraints and other improvements. (Note: Work done in collaboration with Matt Snover, who is the main developer of TERp.) |