Semi-Supervised Approaches for Information Analysis

 

Rebecca Hwa

University of Pittsburgh


UMIACS Computational Linguistics Colloquium

February 10, 2004, 11:00am, AVW Room 2460


ABSTRACT

 

Increasingly, information gathering and analysis have become an integral part of our daily activities.  Product reviews written by other customers help us decide what to buy; online forums and blogs heighten our awareness to multiple perspectives of world events; powerful web search engines and community maintained wikis allow us to learn about a topic easily and quickly, no matter how esoteric. Given the range and scale of the available data however, it is difficult for us to pick out the relevant part from the sea of information.  Thus, developing automatic methods to preprocess these data is a major challenge.  Machine learning approaches offer a way to enable a system to process a wide range of information input, but acquiring sufficient amount of annotated training data is another concern.

 

In this talk, I discuss two approaches for addressing these challenges.  One way to help users to find relevant information more efficiently is to differentiate facts from opinions. I present a system for determining the strength of the subjectivity of complex text.  Another concern is that the information the user wants may be in a form unintelligible to the user, such as a foreign language.  I present a framework that quickly develops resources necessary for machine translation.  Both approaches use semi-supervised learning methods to reduce the systems' reliance on annotated training data.

 


 About the Speaker:

 

Rebecca Hwa joined the University of Pittsburgh's Computer Science Department as an assistant professor in the fall of 2003.  She received her BS in Computer Science and Engineering from UCLA in 1993, and her PhD in Computer Science from Harvard University in 2001. Her research is in the area of artificial intelligence. Her recent work investigates problems at the intersection of machine learning and natural language processing.  She has published articles on a variety of topics including: active learning, co-training, opinion detection, statistical parsing, and machine translation.  She currently serves as a member of the editorial board of Computational Linguistics.

 For the colloquium series schedule, see the UMD Computational http://www.umiacs.umd.edu/research/CLIP/colloq/.  If you are interested in meeting with the speaker, please contact Doug <http://www.glue.umd.edu/~oard/>  Oard (oard@umiacs.umd.edu <mailto:oard@umiacs.umd.edu> ).