Using a randomised controlled clinical trial to evaluate an NLG system

Ehud Baruch Reiter, R Robertson, A S Lennox, L Osman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The STOP system, which generates personalised smoking-cessation letters, was evaluated by a randomised controlled clinical trial. We believe this is the largest and perhaps most rigorous task effectiveness evaluation ever performed on an NLG system. The detailed results of the clinical trial have been presented elsewhere, in the medical literature. In this paper we discuss the clinical trial itself: its structure and cost, what we did and did not learn from it (especially considering that the trial showed that STOP was not effective), and how it compares to other NLG evaluation techniques.

Original languageEnglish
Title of host publication39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE
Place of PublicationSOMERSET
PublisherAssociation for Computational Linguistics
Pages434-441
Number of pages8
ISBN (Print)1-55860-767-6
Publication statusPublished - 2001

Keywords

  • GENERATION

Cite this

Reiter, E. B., Robertson, R., Lennox, A. S., & Osman, L. (2001). Using a randomised controlled clinical trial to evaluate an NLG system. In 39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE (pp. 434-441). SOMERSET: Association for Computational Linguistics.

Using a randomised controlled clinical trial to evaluate an NLG system. / Reiter, Ehud Baruch; Robertson, R ; Lennox, A S ; Osman, L .

39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE. SOMERSET : Association for Computational Linguistics, 2001. p. 434-441.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Reiter, EB, Robertson, R, Lennox, AS & Osman, L 2001, Using a randomised controlled clinical trial to evaluate an NLG system. in 39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE. Association for Computational Linguistics, SOMERSET, pp. 434-441.
Reiter EB, Robertson R, Lennox AS, Osman L. Using a randomised controlled clinical trial to evaluate an NLG system. In 39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE. SOMERSET: Association for Computational Linguistics. 2001. p. 434-441
Reiter, Ehud Baruch ; Robertson, R ; Lennox, A S ; Osman, L . / Using a randomised controlled clinical trial to evaluate an NLG system. 39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE. SOMERSET : Association for Computational Linguistics, 2001. pp. 434-441
@inproceedings{bafaff337045437ab6c90d24a54080bc,
title = "Using a randomised controlled clinical trial to evaluate an NLG system",
abstract = "The STOP system, which generates personalised smoking-cessation letters, was evaluated by a randomised controlled clinical trial. We believe this is the largest and perhaps most rigorous task effectiveness evaluation ever performed on an NLG system. The detailed results of the clinical trial have been presented elsewhere, in the medical literature. In this paper we discuss the clinical trial itself: its structure and cost, what we did and did not learn from it (especially considering that the trial showed that STOP was not effective), and how it compares to other NLG evaluation techniques.",
keywords = "GENERATION",
author = "Reiter, {Ehud Baruch} and R Robertson and Lennox, {A S} and L Osman",
year = "2001",
language = "English",
isbn = "1-55860-767-6",
pages = "434--441",
booktitle = "39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE",
publisher = "Association for Computational Linguistics",

}

TY - GEN

T1 - Using a randomised controlled clinical trial to evaluate an NLG system

AU - Reiter, Ehud Baruch

AU - Robertson, R

AU - Lennox, A S

AU - Osman, L

PY - 2001

Y1 - 2001

N2 - The STOP system, which generates personalised smoking-cessation letters, was evaluated by a randomised controlled clinical trial. We believe this is the largest and perhaps most rigorous task effectiveness evaluation ever performed on an NLG system. The detailed results of the clinical trial have been presented elsewhere, in the medical literature. In this paper we discuss the clinical trial itself: its structure and cost, what we did and did not learn from it (especially considering that the trial showed that STOP was not effective), and how it compares to other NLG evaluation techniques.

AB - The STOP system, which generates personalised smoking-cessation letters, was evaluated by a randomised controlled clinical trial. We believe this is the largest and perhaps most rigorous task effectiveness evaluation ever performed on an NLG system. The detailed results of the clinical trial have been presented elsewhere, in the medical literature. In this paper we discuss the clinical trial itself: its structure and cost, what we did and did not learn from it (especially considering that the trial showed that STOP was not effective), and how it compares to other NLG evaluation techniques.

KW - GENERATION

M3 - Conference contribution

SN - 1-55860-767-6

SP - 434

EP - 441

BT - 39TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE

PB - Association for Computational Linguistics

CY - SOMERSET

ER -