Data-drive Techniques for Topic Segmentation and Categorization in the MALACH Project

Martin Franz


IBM T.J. Watson


UMIACS Computational Linguistics Colloquium

October 1, 2003,
11:00am, AVW Room 2120


The presentation will focus on two areas investigated as part of the Multilingual Access to Large spoken ArCHives (MALACH) project: Topic Segmentation (identifying intervals of topically homogeneous text data) and Segment Categorization (assigning text segments into topic-based categories). After introducing the MALACH project, we will describe the problems addressed, the techniques used, the current status of the results, and some of the issues remaining be solved.

About the speaker:

Martin Franz holds an M.S. (CS/EE) degree from Czech Technical University in Prague. He joined the Speech Group at the IBM T. J. Watson Research Center in 1992 to work on language modeling for speech.


For the colloquium series schedule, see the UMD Computational Linguistics Colloquium Series web page at http://umiacs.umd.edu/~resnik/cl_colloquium/. If you are interested in meeting with the speaker, please contact Doug Oard (oard@umiacs.umd.edu).