Making test corpora for question answering more representative

Andrew Walker, Andrew Starkey, Jeff Z. Pan, Advaith Siddharthan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Despite two high profile series of challenges devoted to question answering technologies there remains no formal study into the representativeness that question corpora bear to real end-user inputs. We examine the corpora used presently and historically in the TREC and QALD challenges in juxtaposition with two more from natural sources and identify a degree of disjointedness between the two. We analyse these differences in depth before discussing a candidate approach to question corpora generation and provide a juxtaposition on its own representativeness. We conclude that these artificial corpora have good overall coverage of grammatical structures but the distribution is skewed, meaning performance measures may be inaccurate.

Original languageEnglish
Title of host publication Information Access Evaluation. Multilinguality, Multimodality, and Interaction
Subtitle of host publicationCLEF 2014.
EditorsEvangelos Kanoulas, Mihai Lupu, Paul Clough, Mark Sanderson, Mark Hall, Allan Hanbury, Elaine Toms
PublisherSpringer-Verlag
Pages1-6
Number of pages6
ISBN (Electronic)9783319113821
ISBN (Print)9783319113814
DOIs
Publication statusPublished - 2014
Event5th International Conference of the CLEF Initiative, CLEF 2014 - Sheffield, United Kingdom
Duration: 15 Sep 201418 Sep 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8685 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Conference

Conference5th International Conference of the CLEF Initiative, CLEF 2014
CountryUnited Kingdom
CitySheffield
Period15/09/1418/09/14

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Fingerprint Dive into the research topics of 'Making test corpora for question answering more representative'. Together they form a unique fingerprint.

  • Cite this

    Walker, A., Starkey, A., Pan, J. Z., & Siddharthan, A. (2014). Making test corpora for question answering more representative. In E. Kanoulas, M. Lupu, P. Clough, M. Sanderson, M. Hall, A. Hanbury, & E. Toms (Eds.), Information Access Evaluation. Multilinguality, Multimodality, and Interaction : CLEF 2014. (pp. 1-6). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8685 LNCS). Springer-Verlag. https://doi.org/10.1007/978-3-319-11382-1_1