Data-drive Techniques for Topic Segmentation and
Categorization in the MALACH Project
Martin
Franz
IBM T.J. Watson
The presentation will focus on two areas investigated as part of the Multilingual Access to Large spoken ArCHives (MALACH) project: Topic Segmentation (identifying intervals of topically homogeneous text data) and Segment Categorization (assigning text segments into topic-based categories). After introducing the MALACH project, we will describe the problems addressed, the techniques used, the current status of the results, and some of the issues remaining be solved.
Martin Franz holds an M.S. (CS/EE) degree from
For the colloquium series schedule, see the UMD Computational Linguistics Colloquium Series web page at http://umiacs.umd.edu/~resnik/cl_colloquium/. If you are interested in meeting with the speaker, please contact Doug Oard (oard@umiacs.umd.edu).