Towards objectively evaluating the quality of generated medical summaries

Francesco Moramarco, Aleksandar Savkov, Damir Juric, Ehud Reiter

Research output: Chapter in Book/Report/Conference proceedingPublished conference contribution

Abstract

We propose a method for evaluating the quality of generated text by asking evaluators to count facts, and computing precision, recall, f-score, and accuracy from the raw counts. We believe this approach leads to a more objective and easier to reproduce evaluation. We apply this to the task of medical report summarisation, where measuring objective quality and accuracy is of paramount importance.
Original languageEnglish
Title of host publicationProceedings of the Workshop on Human Evaluation of NLP Systems (HumEval)
Subtitle of host publicationEACL 2021
EditorsAnya Belz, Shubham Agarwal, Yvette Graham, Ehud Reiter, Anastasia Shimorina
PublisherACL Anthology
Pages56-61
Number of pages6
ISBN (Electronic)978-1-954085-10-7
Publication statusPublished - 19 Apr 2021
EventWorkshop on Human Evaluation of NLP Systems - virtual
Duration: 19 Apr 202119 Apr 2021
https://www.virtual2021.eacl.org/workshop_WS-5.html

Workshop

WorkshopWorkshop on Human Evaluation of NLP Systems
Period19/04/2119/04/21
Internet address

Cite this