I am working on an approach for deriving content descriptions for single texts as well as for clusters and collections of texts. A content description in this context is a set of concepts found to be salient in the given unit, i.e. a text, a cluster, or a collection. A salient concept may be a specific concept directly mentioned in the given unit or a more general concept derived from others. The salient concepts are presented in an ordered way guided by the hierarchical relations defined in WordNet. Content descriptions are not static, rather they can be tailored dynamically to an information need by adjusting the general level of detail as well as by selectively adding or omitting details.
The generation of content descriptions relies on statistical measures and on WordNet's definition of synonym sets and their relationships. Furthermore, it requires linguistic preprocessing of the input texts, which includes part-of-speech tagging, selection of the appropriate word senses, and syntactic parsing of noun phrases.
In my talk, I will give an overview of the approach and report on the results of first experiments conducted.
For the colloquium series schedule, see the UMD Computational Linguistics Colloquium Series web page at http://umiacs.umd.edu/~resnik/cl_colloquium/. If you are interested in meeting with the speaker, please contact Mari Broman Olsen (molsen@umiacs.umd.edu) or Philip Resnik (resnik@umiacs.umd.edu).