Making test corpora for question answering more representative

Andrew Walker; Andrew Starkey; Jeff Z. Pan; Advaith Siddharthan

doi:10.1007/978-3-319-11382-1_1

Making test corpora for question answering more representative

Andrew Walker, Andrew Starkey, Jeff Z. Pan, Advaith Siddharthan

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

1 Citation (Scopus)

Abstract

Despite two high profile series of challenges devoted to question answering technologies there remains no formal study into the representativeness that question corpora bear to real end-user inputs. We examine the corpora used presently and historically in the TREC and QALD challenges in juxtaposition with two more from natural sources and identify a degree of disjointedness between the two. We analyse these differences in depth before discussing a candidate approach to question corpora generation and provide a juxtaposition on its own representativeness. We conclude that these artificial corpora have good overall coverage of grammatical structures but the distribution is skewed, meaning performance measures may be inaccurate.

Original language	English
Title of host publication	Information Access Evaluation. Multilinguality, Multimodality, and Interaction
Subtitle of host publication	CLEF 2014.
Editors	Evangelos Kanoulas, Mihai Lupu, Paul Clough, Mark Sanderson, Mark Hall, Allan Hanbury, Elaine Toms
Publisher	Springer-Verlag
Pages	1-6
Number of pages	6
ISBN (Electronic)	9783319113821
ISBN (Print)	9783319113814
DOIs	https://doi.org/10.1007/978-3-319-11382-1_1
Publication status	Published - 2014
Event	5th International Conference of the CLEF Initiative, CLEF 2014 - Sheffield, United Kingdom Duration: 15 Sept 2014 → 18 Sept 2014

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	8685 LNCS
ISSN (Print)	03029743
ISSN (Electronic)	16113349

Conference

Conference	5th International Conference of the CLEF Initiative, CLEF 2014
Country/Territory	United Kingdom
City	Sheffield
Period	15/09/14 → 18/09/14

Access to Document

10.1007/978-3-319-11382-1_1Licence: Unspecified

Cite this

Walker, A., Starkey, A., Pan, J. Z., & Siddharthan, A. (2014). Making test corpora for question answering more representative. In E. Kanoulas, M. Lupu, P. Clough, M. Sanderson, M. Hall, A. Hanbury, & E. Toms (Eds.), Information Access Evaluation. Multilinguality, Multimodality, and Interaction : CLEF 2014. (pp. 1-6). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8685 LNCS). Springer-Verlag. https://doi.org/10.1007/978-3-319-11382-1_1

Making test corpora for question answering more representative. / Walker, Andrew; Starkey, Andrew; Pan, Jeff Z. et al.
Information Access Evaluation. Multilinguality, Multimodality, and Interaction : CLEF 2014. . ed. / Evangelos Kanoulas; Mihai Lupu; Paul Clough; Mark Sanderson; Mark Hall; Allan Hanbury; Elaine Toms. Springer-Verlag, 2014. p. 1-6 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8685 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

Walker, A, Starkey, A, Pan, JZ & Siddharthan, A 2014, Making test corpora for question answering more representative. in E Kanoulas, M Lupu, P Clough, M Sanderson, M Hall, A Hanbury & E Toms (eds), Information Access Evaluation. Multilinguality, Multimodality, and Interaction : CLEF 2014. . Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8685 LNCS, Springer-Verlag, pp. 1-6, 5th International Conference of the CLEF Initiative, CLEF 2014, Sheffield, United Kingdom, 15/09/14. https://doi.org/10.1007/978-3-319-11382-1_1

Walker A, Starkey A, Pan JZ, Siddharthan A. Making test corpora for question answering more representative. In Kanoulas E, Lupu M, Clough P, Sanderson M, Hall M, Hanbury A, Toms E, editors, Information Access Evaluation. Multilinguality, Multimodality, and Interaction : CLEF 2014. . Springer-Verlag. 2014. p. 1-6. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-319-11382-1_1

Walker, Andrew ; Starkey, Andrew ; Pan, Jeff Z. et al. / Making test corpora for question answering more representative. Information Access Evaluation. Multilinguality, Multimodality, and Interaction : CLEF 2014. . editor / Evangelos Kanoulas ; Mihai Lupu ; Paul Clough ; Mark Sanderson ; Mark Hall ; Allan Hanbury ; Elaine Toms. Springer-Verlag, 2014. pp. 1-6 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{bd9afb598af84fe484d5327b940d4835,

title = "Making test corpora for question answering more representative",

abstract = "Despite two high profile series of challenges devoted to question answering technologies there remains no formal study into the representativeness that question corpora bear to real end-user inputs. We examine the corpora used presently and historically in the TREC and QALD challenges in juxtaposition with two more from natural sources and identify a degree of disjointedness between the two. We analyse these differences in depth before discussing a candidate approach to question corpora generation and provide a juxtaposition on its own representativeness. We conclude that these artificial corpora have good overall coverage of grammatical structures but the distribution is skewed, meaning performance measures may be inaccurate.",

author = "Andrew Walker and Andrew Starkey and Pan, {Jeff Z.} and Advaith Siddharthan",

year = "2014",

doi = "10.1007/978-3-319-11382-1_1",

language = "English",

isbn = "9783319113814",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer-Verlag",

pages = "1--6",

editor = "Evangelos Kanoulas and Mihai Lupu and Paul Clough and Mark Sanderson and Mark Hall and Allan Hanbury and Elaine Toms",

booktitle = "Information Access Evaluation. Multilinguality, Multimodality, and Interaction",

note = "5th International Conference of the CLEF Initiative, CLEF 2014 ; Conference date: 15-09-2014 Through 18-09-2014",

}

TY - GEN

T1 - Making test corpora for question answering more representative

AU - Walker, Andrew

AU - Starkey, Andrew

AU - Pan, Jeff Z.

AU - Siddharthan, Advaith

PY - 2014

Y1 - 2014

N2 - Despite two high profile series of challenges devoted to question answering technologies there remains no formal study into the representativeness that question corpora bear to real end-user inputs. We examine the corpora used presently and historically in the TREC and QALD challenges in juxtaposition with two more from natural sources and identify a degree of disjointedness between the two. We analyse these differences in depth before discussing a candidate approach to question corpora generation and provide a juxtaposition on its own representativeness. We conclude that these artificial corpora have good overall coverage of grammatical structures but the distribution is skewed, meaning performance measures may be inaccurate.

AB - Despite two high profile series of challenges devoted to question answering technologies there remains no formal study into the representativeness that question corpora bear to real end-user inputs. We examine the corpora used presently and historically in the TREC and QALD challenges in juxtaposition with two more from natural sources and identify a degree of disjointedness between the two. We analyse these differences in depth before discussing a candidate approach to question corpora generation and provide a juxtaposition on its own representativeness. We conclude that these artificial corpora have good overall coverage of grammatical structures but the distribution is skewed, meaning performance measures may be inaccurate.

UR - http://www.scopus.com/inward/record.url?scp=84906776564&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-11382-1_1

DO - 10.1007/978-3-319-11382-1_1

M3 - Published conference contribution

AN - SCOPUS:84906776564

SN - 9783319113814

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 1

EP - 6

BT - Information Access Evaluation. Multilinguality, Multimodality, and Interaction

A2 - Kanoulas, Evangelos

A2 - Lupu, Mihai

A2 - Clough, Paul

A2 - Sanderson, Mark

A2 - Hall, Mark

A2 - Hanbury, Allan

A2 - Toms, Elaine

PB - Springer-Verlag

T2 - 5th International Conference of the CLEF Initiative, CLEF 2014

Y2 - 15 September 2014 through 18 September 2014

ER -

Making test corpora for question answering more representative

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Andrew Starkey

Cite this

Making test corpora for question answering more representative

Abstract

Publication series

Conference

Access to Document

Other files and links

Fingerprint

Profiles

Andrew Starkey

Cite this