Abstract
Most Natural Language Generation systems need to produce accurate texts. We propose a methodology for high-quality human evaluation of the accuracy of generated texts, which is intended to serve as a gold-standard for accuracy evaluations of data-to-text systems. We use our methodology to evaluate the accuracy of computer generated basketball summaries. We then show how our gold standard evaluation can be used to validate automated metrics.
Original language | English |
---|---|
Pages | 158-168 |
Number of pages | 11 |
Publication status | Published - Dec 2020 |
Event | Proceedings of the 13th International Conference on Natural Language Generation - Held online Dublin City University, Dublin, Ireland Duration: 15 Dec 2020 → 18 Dec 2020 Conference number: 13 https://www.inlg2020.org/ |
Conference
Conference | Proceedings of the 13th International Conference on Natural Language Generation |
---|---|
Abbreviated title | INLG 2020 |
Country/Territory | Ireland |
City | Dublin |
Period | 15/12/20 → 18/12/20 |
Internet address |