Natural language processing technology has made tremendous progress from the early rule-based systems to the more robust, statistical methods of today. However, even the best systems still have difficulty analyzing sentences that humans easily comprehend. One of the main reasons for this is that humans have an abundance of background knowledge that we can readily access to resolve ambiguities and make accurate inferences. To advance the state-of-the-art in language processing, we need methods both to construct such knowledge bases automatically and to exploit them to enhance language understanding.
In this talk, I will describe machine learning methods to build databases from unstructured text. I will show how more powerful knowledge representations can improve language processing systems, and will present novel probabilistic learning and inference algorithms necessitated by these more complex representations. In particular, we represent domain knowledge with weighted logical formulae, and design learning and inference solutions based on Metropolis-Hastings sampling. I will present results on a number of language processing problems, including named-entity recognition, relation extraction, coreference resolution, and text mining.
Aron Culotta is completing his Ph.D. in Computer Science from the University of Massachusetts, Amherst. He is a member of the Information Extraction and Synthesis Lab and is advised by Professor Andrew McCallum. His research focus is on statistical machine learning approaches to natural language processing. He obtained his B.S. in Computer Science from Tulane University and his M.S. from the University of Massachusetts, and is supported by a Live Labs Fellowship from Microsoft.
This talk is part of the CLIP Colloquium Series, organized by Jimmy Lin (jimmylin -at- umd .dot. edu). For the complete schedule, please visit http://www.umiacs.umd.edu/research/CLIP/colloq/.