Indirect Supervised Learning of Content
Selection Rules
Pablo Duboue
Department of
Abstract:
As online data becomes more and more abundant, there
is a growing need to filter information for users and therefore reduce the
information overload. This situation is
akin to the Content Selection (CS) problem faced by a Natural Language
Generation (NLG) system when starting to build a new text. In a generation system, the CS module decides
which pieces of information to include in the final generated text. It has been argued that CS is central for the
user acceptance of a generation system (as users may tolerate other type of
errors as long the information is readily available on the output). Moreover, the CS problem is quite domain
dependent; major changes in CS knowledge are needed when moving a system to a
different domain.
In this talk, I will present my work on the automatic
acquisition of CS (Content Selection) rules, as a way to provide a domain
independent solution to the the CS problem.
As training material, I employ an aligned Text-Data corpus, a resource
that is increasingly popular for learning for NLG (as they are readily
available and do not require expensive hand labelling). However, aligned Text-Data corpora only
provide indirect information about whether or not a piece of information has
been selected or not by the human writer to be included in the text. Indirect Supervised Learning is my proposed
solution to this problem. It has two
steps; in the first step, the Text-Data corpus is transformed into a dataset
with classification labels. In the
second step, supervised learning machinery acquires the CS rules from this
dataset. I evaluate the approach by
comparing the output of my system with the information selected by human
authors in unseen texts, obtaining a F* of 0.67 with high recall.
About the
Speaker:
Pablo Duboue http://www.cs.columbia.edu/~pablo/
is a PhD candidate in the Computer Science Department at
For the colloquium
series schedule, see the UMD Computational http://www.umiacs.umd.edu/research/CLIP/colloq/. If you are interested in meeting with the
speaker, please contact Doug <http://www.glue.umd.edu/~oard/>
Oard (oard@umiacs.umd.edu <mailto:oard@umiacs.umd.edu>
).