What would the neighbors say? K-nearest neighbors vs.
language modeling in multi-classifications tasks
G.
Craig Murray
University of Maryland
College of Information Studies
UMIACS Computational Linguistics Colloquium
October 22, 2003,
11:00am, AVW Room
2120
In this talk I will present results from my summer work at IBM
T.J. Watson Research
Center. I show comparative results from two different
approaches to text categorization in an environment for which multiple
categories apply. Perplexity based
language modeling is compared to a more traditional IR technique in which Okapi
weights are used to find "close" training examples. Background smoothing and correction proved to
be an essential component of the LM-based technique. Robustness of kNN and essential differences
between the techniques will also be discussed.
For the colloquium series schedule, see the UMD Computational
Linguistics Colloquium Series web page at http://umiacs.umd.edu/~resnik/cl_colloquium/.
If you are interested in meeting with the speaker, please contact Doug Oard (oard@umiacs.umd.edu).