A Framework for Evaluating Snippet Generation for Dataset Search

Xiaxia Wang; Jinchi Chen; Shuxin Li; Gong Cheng; Jeff Z. Pan; Evgeny Kharlamov; Yuzhong Qu

A Framework for Evaluating Snippet Generation for Dataset Search

Xiaxia Wang, Jinchi Chen, Shuxin Li, Gong Cheng, Jeff Z. Pan, Evgeny Kharlamov, Yuzhong Qu

Computing Science

Research output: Working paper

6 Citations (Scopus)

Abstract

Reusing existing datasets is of considerable significance to researchers and developers. Dataset search engines help a user find relevant datasets for reuse. They can present a snippet for each retrieved dataset to explain its relevance to the user's data needs. This emerging problem of snippet generation for dataset search has not received much research attention. To provide a basis for future research, we introduce a framework for quantitatively evaluating the quality of a dataset snippet. The proposed metrics assess the extent to which a snippet matches the query intent and covers the main content of the dataset. To establish a baseline, we adapt four state-of-the-art methods from related fields to our problem, and perform an empirical evaluation based on real-world datasets and queries. We also conduct a user study to verify our findings. The results demonstrate the effectiveness of our evaluation framework, and suggest directions for future research.

Original language	English
Publisher	ArXiv
Publication status	Submitted - 2 Jul 2019

Publication series

Name	arXiv

Bibliographical note

17 pages, to appear at the research track of the 18th International Semantic Web Conference (ISWC 2019)

Keywords

cs.IR
cs.DB

Access to Document

https://arxiv.org/abs/1907.01183Licence: CC BY

Cite this

@techreport{020244e6980747778fc2e5d5c00e44a4,

title = "A Framework for Evaluating Snippet Generation for Dataset Search",

abstract = " Reusing existing datasets is of considerable significance to researchers and developers. Dataset search engines help a user find relevant datasets for reuse. They can present a snippet for each retrieved dataset to explain its relevance to the user's data needs. This emerging problem of snippet generation for dataset search has not received much research attention. To provide a basis for future research, we introduce a framework for quantitatively evaluating the quality of a dataset snippet. The proposed metrics assess the extent to which a snippet matches the query intent and covers the main content of the dataset. To establish a baseline, we adapt four state-of-the-art methods from related fields to our problem, and perform an empirical evaluation based on real-world datasets and queries. We also conduct a user study to verify our findings. The results demonstrate the effectiveness of our evaluation framework, and suggest directions for future research. ",

keywords = "cs.IR, cs.DB",

author = "Xiaxia Wang and Jinchi Chen and Shuxin Li and Gong Cheng and Pan, {Jeff Z.} and Evgeny Kharlamov and Yuzhong Qu",

note = "17 pages, to appear at the research track of the 18th International Semantic Web Conference (ISWC 2019)",

year = "2019",

month = jul,

day = "2",

language = "English",

series = "arXiv",

publisher = "ArXiv",

type = "WorkingPaper",

institution = "ArXiv",

}

TY - UNPB

T1 - A Framework for Evaluating Snippet Generation for Dataset Search

AU - Wang, Xiaxia

AU - Chen, Jinchi

AU - Li, Shuxin

AU - Cheng, Gong

AU - Pan, Jeff Z.

AU - Kharlamov, Evgeny

AU - Qu, Yuzhong

N1 - 17 pages, to appear at the research track of the 18th International Semantic Web Conference (ISWC 2019)

PY - 2019/7/2

Y1 - 2019/7/2

N2 - Reusing existing datasets is of considerable significance to researchers and developers. Dataset search engines help a user find relevant datasets for reuse. They can present a snippet for each retrieved dataset to explain its relevance to the user's data needs. This emerging problem of snippet generation for dataset search has not received much research attention. To provide a basis for future research, we introduce a framework for quantitatively evaluating the quality of a dataset snippet. The proposed metrics assess the extent to which a snippet matches the query intent and covers the main content of the dataset. To establish a baseline, we adapt four state-of-the-art methods from related fields to our problem, and perform an empirical evaluation based on real-world datasets and queries. We also conduct a user study to verify our findings. The results demonstrate the effectiveness of our evaluation framework, and suggest directions for future research.

AB - Reusing existing datasets is of considerable significance to researchers and developers. Dataset search engines help a user find relevant datasets for reuse. They can present a snippet for each retrieved dataset to explain its relevance to the user's data needs. This emerging problem of snippet generation for dataset search has not received much research attention. To provide a basis for future research, we introduce a framework for quantitatively evaluating the quality of a dataset snippet. The proposed metrics assess the extent to which a snippet matches the query intent and covers the main content of the dataset. To establish a baseline, we adapt four state-of-the-art methods from related fields to our problem, and perform an empirical evaluation based on real-world datasets and queries. We also conduct a user study to verify our findings. The results demonstrate the effectiveness of our evaluation framework, and suggest directions for future research.

KW - cs.IR

KW - cs.DB

M3 - Working paper

T3 - arXiv

BT - A Framework for Evaluating Snippet Generation for Dataset Search

PB - ArXiv

ER -

A Framework for Evaluating Snippet Generation for Dataset Search

Abstract

Publication series

Bibliographical note

Keywords

Access to Document

Fingerprint

Cite this