Abstract
Data2Text Natural Language Generation is a complex and varied task. We investigate the data requirements for the difficult real-world problem of generating statistic-focused summaries of basketball games. This has recently been tackled using the Rotowire and Rotowire-FG datasets of paired data and text. It can, however, be difficult to filter, query, and maintain such large volumes of data. In this resource paper, we introduce the SportSett:Basketball database. This easy-to-use resource allows for simple scripts to be written which generate data in suitable formats for a variety of systems. Building upon the existing
data, we provide more attributes, across multiple dimensions, increasing the overlap of content between data and text. We also highlight and resolve issues of training, validation and test partition contamination in these previous datasets
data, we provide more attributes, across multiple dimensions, increasing the overlap of content between data and text. We also highlight and resolve issues of training, validation and test partition contamination in these previous datasets
Original language | English |
---|---|
Publication status | Accepted/In press - 17 Aug 2020 |
Event | IntelLanG : Intelligent Information Processing and Natural Language Generation - Santiago de Compostela, Spain Duration: 7 Sep 2020 → 7 Sep 2020 https://intellang.github.io/ |
Conference
Conference | IntelLanG |
---|---|
Country/Territory | Spain |
City | Santiago de Compostela |
Period | 7/09/20 → 7/09/20 |
Internet address |