UMIACS Computational Linguistics Colloquium, August 30, 2001

Latent Maximum Entropy Principle: A Unified ProbabilisticFramework for Statistical Language Modeling


Shaojun Wang


CMU


UMIACS Computational Linguistics Colloquium

August 30, 2001,
11am, AVW Room 2120


What is the right representation for a natural language? A Markov chain? A stochastic branching process? A contingency table? .... While each such model describes a specific linguistic phenomenon of natural language, over the last forty years, we have lacked a unified probabilistic framework to encode language that is able to simultaneously take into account the local information inherent in Markov chain models, the hierarchical syntactic structure of sentences in stochastic branching processes, and the semantic content of documents in bag-of-words categorical mixture log-linear models. Recently we proposed a latent maximum entropy principle that is able to provide just such a tool for statistical language modeling. In this talk, I will describe the latent maximum entropy principle, which extends Jaynes' original maximum entropy principle in a way that accommodates latent variables. I will give the problem formulation, its solution, and certain convergence properties. Then I will show how to use this machine learning technique for statistical language modeling in a principled way with mixtures of exponential families that have rich expressive power. Finally, I will draw some conclusions and point out future research directions.

About the speaker:

Shaojun Wang received his Ph.D. in Electrical and Computer Engineering from the University of Illinois. His research interests include statistical modeling and signal processing for speech, audio, language and text, statistical and computational learning theory, multimedia, and human-machine communications. He is presently a postdoctoral researcher at the Center for Automated Learning and Discovery at CMU.


For the colloquium series schedule, see the UMD Computational Linguistics Colloquium Series web page at http://umiacs.umd.edu/~resnik/cl_colloquium/. If you are interested in meeting with the speaker, please contact Philip Resnik (resnik@umiacs.umd.edu).