|
Objective:
We are investigating the task of automatic generation of headlines for news stories. Our focus is currently on headline generation for English texts. Although news stories already have human-generated headlines, these pre-existing “abstracts” are frequently not descriptive enough for our purposes. We have observed three types of very short abstracts: eye-catchers, indicative abstracts and informative abstracts. Eye-catchers (e.g., “‘Under God’ Under Fire”) are designed to entice the reader to read the article, but do not tell what the article is about. Indicative abstracts (e.g., “Pledge of Allegiance, U.S. Court, constitutionality”) indicate the topic, but do not tell what happened. Informative abstracts (e.g., “U.S. Court Declares Pledge of Allegiance Unconstitutional”) do tell what happens in the story. Our goal is to produce informative abstracts.
The long-term goal of our effort is to apply our approach to noisy input, e.g., multilingual text and speech broadcasts, where the application is clearer, as these inputs don’t have viable (or any) human-generated headlines.
Our method is to form headlines by selecting headline words from story words found in the newspaper article. As a first approximation, we select headline words from story words in order of appearance in the story. In addition, morphological variants of story words may appear as headline words. Consider the following excerpt from a news story:
Story Words: After months of debate following the Sept. 11 terrorist hijackings, the Transportation Department has decided that airline pilots will not be allowed to have guns in the cockpits.
Generated Headline: Pilots not allowed to have guns in cockpits.
In our technique for producing headlines, stories are considered to be the output of a Hidden Markov Model, in which some states emit headline words and others emit non-headline words. Many paths through the HMM can generate a given story. Each path corresponds to a possible headline composed of the story words emitted by headline-states. A Viterbi algorithm is used to calculate the most likely headline for a given story.
For more information read the paper or look at the powerpoint slides from DUC 2002. Researchers:
Bonnie Dorr and David Zajic
Partners and Sponsors:
Rich Schwartz, BBN |