Information Status Distinctions and Referring Expressions: An Empirical Study of References to People in News Summaries

Advaith Siddharthan, Ani Nenkova, Kathleen McKeown

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

While there has been much theoretical work on using various information status distinctions to explain the form of references in written text, there have been few studies that attempt to automatically learn these distinctions for generating references in the context of computer regenerated text. In this paper, we present a model for generating references to people in news summaries that incorporates insights from both theory and a corpus analysis of human written summaries. In particular, our model captures how two properties of a person referred to in the summary -- familiarity to the reader and global salience in the news story -- affect the content and form of the initial reference to that person in a summary. We demonstrate that these two distinctions can be learnt from a typical input for multi-document summarisation and that they can be used to make regeneration decisions that improve the quality of extractive summaries.
Original languageEnglish
Pages (from-to)811-842
Number of pages31
JournalComputational Linguistics
Volume37
Issue number4
Early online date28 Nov 2011
DOIs
Publication statusPublished - Dec 2011

Fingerprint

news
human being
News
Referring Expressions
Empirical Study
Summary
Information Status
Person

Cite this

Information Status Distinctions and Referring Expressions : An Empirical Study of References to People in News Summaries. / Siddharthan, Advaith; Nenkova, Ani; McKeown, Kathleen.

In: Computational Linguistics, Vol. 37, No. 4, 12.2011, p. 811-842.

Research output: Contribution to journalArticle

Siddharthan, Advaith ; Nenkova, Ani ; McKeown, Kathleen. / Information Status Distinctions and Referring Expressions : An Empirical Study of References to People in News Summaries. In: Computational Linguistics. 2011 ; Vol. 37, No. 4. pp. 811-842.
@article{3a4f90436ea34823ad1acd8e85464b54,
title = "Information Status Distinctions and Referring Expressions: An Empirical Study of References to People in News Summaries",
abstract = "While there has been much theoretical work on using various information status distinctions to explain the form of references in written text, there have been few studies that attempt to automatically learn these distinctions for generating references in the context of computer regenerated text. In this paper, we present a model for generating references to people in news summaries that incorporates insights from both theory and a corpus analysis of human written summaries. In particular, our model captures how two properties of a person referred to in the summary -- familiarity to the reader and global salience in the news story -- affect the content and form of the initial reference to that person in a summary. We demonstrate that these two distinctions can be learnt from a typical input for multi-document summarisation and that they can be used to make regeneration decisions that improve the quality of extractive summaries.",
author = "Advaith Siddharthan and Ani Nenkova and Kathleen McKeown",
year = "2011",
month = "12",
doi = "10.1162/COLI_a_00077",
language = "English",
volume = "37",
pages = "811--842",
journal = "Computational Linguistics",
issn = "0891-2017",
publisher = "MIT Press Journals",
number = "4",

}

TY - JOUR

T1 - Information Status Distinctions and Referring Expressions

T2 - An Empirical Study of References to People in News Summaries

AU - Siddharthan, Advaith

AU - Nenkova, Ani

AU - McKeown, Kathleen

PY - 2011/12

Y1 - 2011/12

N2 - While there has been much theoretical work on using various information status distinctions to explain the form of references in written text, there have been few studies that attempt to automatically learn these distinctions for generating references in the context of computer regenerated text. In this paper, we present a model for generating references to people in news summaries that incorporates insights from both theory and a corpus analysis of human written summaries. In particular, our model captures how two properties of a person referred to in the summary -- familiarity to the reader and global salience in the news story -- affect the content and form of the initial reference to that person in a summary. We demonstrate that these two distinctions can be learnt from a typical input for multi-document summarisation and that they can be used to make regeneration decisions that improve the quality of extractive summaries.

AB - While there has been much theoretical work on using various information status distinctions to explain the form of references in written text, there have been few studies that attempt to automatically learn these distinctions for generating references in the context of computer regenerated text. In this paper, we present a model for generating references to people in news summaries that incorporates insights from both theory and a corpus analysis of human written summaries. In particular, our model captures how two properties of a person referred to in the summary -- familiarity to the reader and global salience in the news story -- affect the content and form of the initial reference to that person in a summary. We demonstrate that these two distinctions can be learnt from a typical input for multi-document summarisation and that they can be used to make regeneration decisions that improve the quality of extractive summaries.

U2 - 10.1162/COLI_a_00077

DO - 10.1162/COLI_a_00077

M3 - Article

VL - 37

SP - 811

EP - 842

JO - Computational Linguistics

JF - Computational Linguistics

SN - 0891-2017

IS - 4

ER -