UMIACS Computational Linguistics Colloquium, April 24, 2002

The Language of Time: Mining Text Corpora for Temporal Information


Inderjeet Mani


MITRE/Georgetown


UMIACS Computational Linguistics Colloquium

April 24, 2002
10am-11:30am, AVW Room 2120


The creation of text corpora annotated with linguistic information has been a driving force in progress in natural language processing. Until recently, however, the annotation of temporal information (including time expressions, tense and aspect, and event structure) has been left out of this picture. Such annotations can benefit applications such as information extraction, question answering, summarization, machine translation, etc. In 1999, to help fill this gap, MITRE and the U.S. government began the design of a scheme for annotating time expressions in a corpus with a canonicalized representation of the times they refer to. Using this scheme, two English corpora were annotated last summer at Georgetown and MITRE, using humans as well as machines. More recently, the scheme has been extended to other languages such as Spanish, French, and Korean, with various corpus-based methods being used to develop temporal taggers for these languages. I will begin with a brief assessment of the annotation scheme, which represents time points, time intervals, and sets of time points, and takes into account several specific kinds of vagueness characteristic of "time talk". The discussion of the multilingual tagging will include a presentation by Clara Cabezas on her work with Philip Resnik on projecting temporal annotations using parallel corpora. I will also present some results from a corpus-based investigation of methods for temporally ordering events in news.


For the colloquium series schedule, see the UMD Computational Linguistics Colloquium Series web page at http://umiacs.umd.edu/~resnik/cl_colloquium/. If you are interested in meeting with the speaker, please contact Philip Resnik (resnik@umiacs.umd.edu).