The CLIP Colloquium Series presents...


Words and Networks

Dragomir Radev (University of Michigan)
February 20, 2008, 11:00am, Location AVW 2460

Textual data is everywhere, in email and scientific papers, in online newspapers and e-commerce sites. The Web contains more than 200 terabytes of text, not even counting the contents of dynamic textual databases. This enormous source of knowledge is seriously underexploited. Textual documents on the Web are very hard to model computationally: they are mostly unstructured, time-dependent, collectively authored, multilingual, and of uneven importance. Traditional grammar-based techniques don't scale up to address such problems. Novel representations and analytical tools are needed.

I will discuss several recent contributions of network-based text mining to the following problems:

I will also present my vision about the directions in which text mining is heading and share some information about the recent North American Computational Linguistics Olympiad.

About the Speaker

Dragomir R. Radev is an Associate Professor of Information, Electrical Engineering and Computer Science, and Linguistics at the University of Michigan, Ann Arbor. He holds a Ph.D. in Computer Science from Columbia University. Before joining Michigan, he was a Research Staff Member at IBM's TJ Watson Research Center in Hawthorne, NY. He is the author of more than 60 refereed papers and his work has been covered by Reuters, NPR, Wired, and many other media.

Dr. Radev's current research on probabilistic and link-based methods for exploiting very large textual repositories, graph-based methods for natural language processing, representing and acquiring knowledge of genome regulation, and semantic entity and relation extraction from Web-scale text document collections is supported by NSF, ONR, and NIH. He is secretary of the Association for Computational Linguistics and associate editor of JAIR and a member of the editorial boards of Information Retrieval and the Journal of Natural Language Engineering. He is also a four-time finalist at the ACM programming finals (as contestant in 1993 and as coach in 1995-1997). He also coached the US national teams at the International Linguistics Olympiad in Russia in 2007, taking one of the two US teams to the first place (tied with Russia) and one of the US individual contestants to the first place in the individual competition.

Dragomir received a graduate teaching award at Columbia and recently, the U. of Michigan award for Outstanding Research Mentorship (UROP). He has worked in different capacities for AT&T, IBM, Bellcore, MITRE, Microsoft, and Yahoo! Dragomir is also the recipient of the 2006 Gosnell Prize for Excellence in Political Methodology


This talk is part of the CLIP Colloquium Series, organized by Jimmy Lin (jimmylin -at- umd .dot. edu). For the complete schedule, please visit http://www.umiacs.umd.edu/research/CLIP/colloq/.