What would the neighbors say? K-nearest neighbors vs. language modeling in multi-classifications tasks

G. Craig Murray

University of Maryland
College
of Information Studies


UMIACS Computational Linguistics Colloquium

October 22, 2003,
11:00am, AVW Room 2120


 

In this talk I will present results from my summer work at IBM T.J. Watson Research Center.  I show comparative results from two different approaches to text categorization in an environment for which multiple categories apply.  Perplexity based language modeling is compared to a more traditional IR technique in which Okapi weights are used to find "close" training examples.  Background smoothing and correction proved to be an essential component of the LM-based technique.  Robustness of kNN and essential differences between the techniques will also be discussed.

 

For the colloquium series schedule, see the UMD Computational Linguistics Colloquium Series web page at http://umiacs.umd.edu/~resnik/cl_colloquium/. If you are interested in meeting with the speaker, please contact Doug Oard (oard@umiacs.umd.edu).