Making test corpora for question answering more representative

Andrew Walker, Andrew Starkey, Jeff Z. Pan, Advaith Siddharthan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Despite two high profile series of challenges devoted to question answering technologies there remains no formal study into the representativeness that question corpora bear to real end-user inputs. We examine the corpora used presently and historically in the TREC and QALD challenges in juxtaposition with two more from natural sources and identify a degree of disjointedness between the two. We analyse these differences in depth before discussing a candidate approach to question corpora generation and provide a juxtaposition on its own representativeness. We conclude that these artificial corpora have good overall coverage of grammatical structures but the distribution is skewed, meaning performance measures may be inaccurate.

Original languageEnglish
Title of host publication Information Access Evaluation. Multilinguality, Multimodality, and Interaction
Subtitle of host publicationCLEF 2014.
EditorsEvangelos Kanoulas, Mihai Lupu, Paul Clough, Mark Sanderson, Mark Hall, Allan Hanbury, Elaine Toms
PublisherSpringer-Verlag
Pages1-6
Number of pages6
ISBN (Electronic)9783319113821
ISBN (Print)9783319113814
DOIs
Publication statusPublished - 2014
Event5th International Conference of the CLEF Initiative, CLEF 2014 - Sheffield, United Kingdom
Duration: 15 Sep 201418 Sep 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8685 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Conference

Conference5th International Conference of the CLEF Initiative, CLEF 2014
CountryUnited Kingdom
CitySheffield
Period15/09/1418/09/14

Fingerprint

Question Answering
Inaccurate
Performance Measures
Coverage
Series
Corpus

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Walker, A., Starkey, A., Pan, J. Z., & Siddharthan, A. (2014). Making test corpora for question answering more representative. In E. Kanoulas, M. Lupu, P. Clough, M. Sanderson, M. Hall, A. Hanbury, & E. Toms (Eds.), Information Access Evaluation. Multilinguality, Multimodality, and Interaction : CLEF 2014. (pp. 1-6). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8685 LNCS). Springer-Verlag. https://doi.org/10.1007/978-3-319-11382-1_1

Making test corpora for question answering more representative. / Walker, Andrew; Starkey, Andrew; Pan, Jeff Z.; Siddharthan, Advaith.

Information Access Evaluation. Multilinguality, Multimodality, and Interaction : CLEF 2014. . ed. / Evangelos Kanoulas; Mihai Lupu; Paul Clough; Mark Sanderson; Mark Hall; Allan Hanbury; Elaine Toms. Springer-Verlag, 2014. p. 1-6 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8685 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Walker, A, Starkey, A, Pan, JZ & Siddharthan, A 2014, Making test corpora for question answering more representative. in E Kanoulas, M Lupu, P Clough, M Sanderson, M Hall, A Hanbury & E Toms (eds), Information Access Evaluation. Multilinguality, Multimodality, and Interaction : CLEF 2014. . Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8685 LNCS, Springer-Verlag, pp. 1-6, 5th International Conference of the CLEF Initiative, CLEF 2014, Sheffield, United Kingdom, 15/09/14. https://doi.org/10.1007/978-3-319-11382-1_1
Walker A, Starkey A, Pan JZ, Siddharthan A. Making test corpora for question answering more representative. In Kanoulas E, Lupu M, Clough P, Sanderson M, Hall M, Hanbury A, Toms E, editors, Information Access Evaluation. Multilinguality, Multimodality, and Interaction : CLEF 2014. . Springer-Verlag. 2014. p. 1-6. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-11382-1_1
Walker, Andrew ; Starkey, Andrew ; Pan, Jeff Z. ; Siddharthan, Advaith. / Making test corpora for question answering more representative. Information Access Evaluation. Multilinguality, Multimodality, and Interaction : CLEF 2014. . editor / Evangelos Kanoulas ; Mihai Lupu ; Paul Clough ; Mark Sanderson ; Mark Hall ; Allan Hanbury ; Elaine Toms. Springer-Verlag, 2014. pp. 1-6 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{bd9afb598af84fe484d5327b940d4835,
title = "Making test corpora for question answering more representative",
abstract = "Despite two high profile series of challenges devoted to question answering technologies there remains no formal study into the representativeness that question corpora bear to real end-user inputs. We examine the corpora used presently and historically in the TREC and QALD challenges in juxtaposition with two more from natural sources and identify a degree of disjointedness between the two. We analyse these differences in depth before discussing a candidate approach to question corpora generation and provide a juxtaposition on its own representativeness. We conclude that these artificial corpora have good overall coverage of grammatical structures but the distribution is skewed, meaning performance measures may be inaccurate.",
author = "Andrew Walker and Andrew Starkey and Pan, {Jeff Z.} and Advaith Siddharthan",
year = "2014",
doi = "10.1007/978-3-319-11382-1_1",
language = "English",
isbn = "9783319113814",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer-Verlag",
pages = "1--6",
editor = "Evangelos Kanoulas and Mihai Lupu and Paul Clough and Mark Sanderson and Mark Hall and Allan Hanbury and Elaine Toms",
booktitle = "Information Access Evaluation. Multilinguality, Multimodality, and Interaction",

}

TY - GEN

T1 - Making test corpora for question answering more representative

AU - Walker, Andrew

AU - Starkey, Andrew

AU - Pan, Jeff Z.

AU - Siddharthan, Advaith

PY - 2014

Y1 - 2014

N2 - Despite two high profile series of challenges devoted to question answering technologies there remains no formal study into the representativeness that question corpora bear to real end-user inputs. We examine the corpora used presently and historically in the TREC and QALD challenges in juxtaposition with two more from natural sources and identify a degree of disjointedness between the two. We analyse these differences in depth before discussing a candidate approach to question corpora generation and provide a juxtaposition on its own representativeness. We conclude that these artificial corpora have good overall coverage of grammatical structures but the distribution is skewed, meaning performance measures may be inaccurate.

AB - Despite two high profile series of challenges devoted to question answering technologies there remains no formal study into the representativeness that question corpora bear to real end-user inputs. We examine the corpora used presently and historically in the TREC and QALD challenges in juxtaposition with two more from natural sources and identify a degree of disjointedness between the two. We analyse these differences in depth before discussing a candidate approach to question corpora generation and provide a juxtaposition on its own representativeness. We conclude that these artificial corpora have good overall coverage of grammatical structures but the distribution is skewed, meaning performance measures may be inaccurate.

UR - http://www.scopus.com/inward/record.url?scp=84906776564&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-11382-1_1

DO - 10.1007/978-3-319-11382-1_1

M3 - Conference contribution

SN - 9783319113814

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 1

EP - 6

BT - Information Access Evaluation. Multilinguality, Multimodality, and Interaction

A2 - Kanoulas, Evangelos

A2 - Lupu, Mihai

A2 - Clough, Paul

A2 - Sanderson, Mark

A2 - Hall, Mark

A2 - Hanbury, Allan

A2 - Toms, Elaine

PB - Springer-Verlag

ER -