Generation Challenges: Results of the Accuracy Evaluation Shared Task

Craig Thomson; Ehud Reiter

Generation Challenges: Results of the Accuracy Evaluation Shared Task

Craig Thomson, Ehud Reiter

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

10 Citations (Scopus)

8 Downloads (Pure)

Abstract

The Shared Task on Evaluating Accuracy focused on techniques (both manual and automatic) for evaluating the factual accuracy of texts produced by neural NLG systems, in a sports-reporting domain. Four teams submitted evaluation techniques for this task, using very different approaches and techniques. The best-performing submissions did encouragingly well at this difficult task. However, all automatic submissions struggled to detect factual errors which are semantically or pragmatically complex (for example, based on incorrect computation or inference).

Original language	English
Title of host publication	The 14th International Conference on Natural Language Generation
Subtitle of host publication	Proceedings of the Conference
Pages	240–248
Number of pages	9
Publication status	Published - 31 Aug 2021
Event	The 14th International Conference on Natural Language Generation - Virtual, Aberdeen, United Kingdom Duration: 20 Sept 2021 → 24 Sept 2021 Conference number: 14 https://inlg2021.github.io/index.html

Conference

Conference	The 14th International Conference on Natural Language Generation
Country/Territory	United Kingdom
City	Aberdeen
Period	20/09/21 → 24/09/21
Internet address	https://inlg2021.github.io/index.html

Access to Document

Thomson_etal_INLG_Generation_Challenges_Results_VoR
Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License. https://creativecommons.org/licenses/by/4.0/
Final published version, 185 KBLicence: CC BY

https://aclanthology.org/2021.inlg-1.23/Licence: CC BY

Cite this

@inproceedings{ee4068d458cf49a3a0214d2c793895ec,

title = "Generation Challenges: Results of the Accuracy Evaluation Shared Task",

abstract = "The Shared Task on Evaluating Accuracy focused on techniques (both manual and automatic) for evaluating the factual accuracy of texts produced by neural NLG systems, in a sports-reporting domain. Four teams submitted evaluation techniques for this task, using very different approaches and techniques. The best-performing submissions did encouragingly well at this difficult task. However, all automatic submissions struggled to detect factual errors which are semantically or pragmatically complex (for example, based on incorrect computation or inference).",

author = "Craig Thomson and Ehud Reiter",

year = "2021",

month = aug,

day = "31",

language = "English",

isbn = "978-1-954085-51-0",

pages = "240–248",

booktitle = "The 14th International Conference on Natural Language Generation",

note = "The 14th International Conference on Natural Language Generation ; Conference date: 20-09-2021 Through 24-09-2021",

url = "https://inlg2021.github.io/index.html",

}

TY - GEN

T1 - Generation Challenges

T2 - The 14th International Conference on Natural Language Generation

AU - Thomson, Craig

AU - Reiter, Ehud

N1 - Conference code: 14

PY - 2021/8/31

Y1 - 2021/8/31

N2 - The Shared Task on Evaluating Accuracy focused on techniques (both manual and automatic) for evaluating the factual accuracy of texts produced by neural NLG systems, in a sports-reporting domain. Four teams submitted evaluation techniques for this task, using very different approaches and techniques. The best-performing submissions did encouragingly well at this difficult task. However, all automatic submissions struggled to detect factual errors which are semantically or pragmatically complex (for example, based on incorrect computation or inference).

AB - The Shared Task on Evaluating Accuracy focused on techniques (both manual and automatic) for evaluating the factual accuracy of texts produced by neural NLG systems, in a sports-reporting domain. Four teams submitted evaluation techniques for this task, using very different approaches and techniques. The best-performing submissions did encouragingly well at this difficult task. However, all automatic submissions struggled to detect factual errors which are semantically or pragmatically complex (for example, based on incorrect computation or inference).

UR - https://inlg2021.github.io/index.html

M3 - Published conference contribution

SN - 978-1-954085-51-0

SP - 240

EP - 248

BT - The 14th International Conference on Natural Language Generation

Y2 - 20 September 2021 through 24 September 2021

ER -

Generation Challenges: Results of the Accuracy Evaluation Shared Task

Abstract

Conference

Access to Document

Other files and links

Fingerprint

Cite this