The CLIP Colloquium Series presents...


Language and Statistics II: Beautiful Models, Chance Gods, and Stoic Seers

Noah Smith Carnegie Mellon University
Wednesday November 4, 11 a.m. to noon AVW 3258

Imagine a trusted assistant who follows the news and blogs you wish you had time to read, wades through your mutual funds' prospectuses, and watches for important technical articles relevant to your research. The assistant would then give you a customized summary, answer your questions, translate whatever wasn't in your dialect, and offer advice about what's likely to happen in upcoming meetings, conferences, or elections. That perfect assistant will one day be an application running on your laptop or smartphone, all thanks to natural language processing (NLP). Accurate NLP relies, to varying extents, on human knowledge about our own languages. Decades ago, this meant hand-written, rule-based systems; today we use expert-annotated text datasets and machine learning. I'll discuss some of my group's recent successes with statistical models for NLP, exposing some of the challenges that arise when linguistic concepts aren't computationally convenient or even well formalized. I will then turn to a second wave in statistical NLP research based on raw data and learners biased by linguistic hypotheses. The case study will be dependency syntax learning without trees, a kind of grammar induction. We begin with the classic Expectation Maximization (EM) algorithm. I'll present several ways to alter EM for improved performance using simply-represented beliefs about linguistic universals. By encoding basic linguistic hypotheses within the learning objective function -- a far more compact representation than the many decisions made in building annotated datasets -- we see considerable accuracy improvements in unsupervised parsing for diverse languages. This research path opens up new computational challenges and new ways of engaging with linguists and domain experts. I'll close with a glimpse of some early results for a new NLP application: forecasting specific kinds of future events from relevant text. I'll describe some findings about language grounded in the financial and political world. This talk includes discussion of joint work with Shay Cohen, William Cohen, Dipanjan Das, Jason Eisner, Kevin Gimpel, Michael Heilman, Shimon Kogan, Dimitry Levin, Andre Martins, Brendan O'Connor, Bryan Routledge, Jacob Sagi, Nathan Schneider, Eric Xing, and Tae Yano.

About the Speaker

Noah Smith is an assistant professor in the School of Computer Science at Carnegie Mellon University. He received his Ph.D. in Computer Science as a Hertz Foundation Fellow from Johns Hopkins University in 2006 and his B.S. in Computer Science and B.A. in Linguistics from the University of Maryland in 2001. His research interests include statistical NLP, especially unsupervised methods, machine learning for structured data, and applications of NLP to the social sciences. He serves on the editorial board of the Computational Linguistics journal and received a best paper award at ACL 2009.


This talk is part of the CLIP Colloquium Series. For the complete schedule, please visit http://www.umiacs.umd.edu/research/CLIP/colloq/.