How redundant is it? An empirical analysis on linked datasets

Honghan Wu, Boris Villazon-Terrazas, Jeff Z. Pan, Jose Manuel Gomez-Perez

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)
6 Downloads (Pure)

Abstract

Data redundancy resides in most, if not all, information systems. Linked Data is no exception. Existing approaches try to avoid data redundancies by proposing compression techniques or succinct data structures. However, data redundancies in Linked Data are useful sometimes, e.g., ontology based data access can make use of A-Box redundancies to avoid unnecessary query rewritings. Either you want to avoid it or make use of it, a good understanding about data redundancies will facilitate your task, e.g., identify the exact redundant parts which could be utilised or choose most effective techniques to compress a particular dataset. Unfortunately, little effort has been put on making the data redundancy explicit to data users. In this paper, we introduce a systematic categorisation for Linked Data redundancy, and propose a graph pattern based approach for efficient analysis. Analysis results on representative datasets lead to a main conclusion, that is redundant-aware techniques are demanded.

Original languageEnglish
Title of host publicationCOLD 2014, Consuming Linked Data
Subtitle of host publicationProceedings of the 5th International Workshop on Consuming Linked Data
EditorsOlaf Hartig, Aidan Hogan, Juan Sequeda
PublisherCEUR-WS
Number of pages12
Publication statusPublished - 7 Oct 2014
Event5th International Workshop on Consuming Linked Data - Riva del Garda, Italy
Duration: 20 Oct 2014 → …

Publication series

NameCEUR Workshop Proceedings
PublisherCEUR-WS
Volume1264
ISSN (Electronic)1613-0073

Workshop

Workshop5th International Workshop on Consuming Linked Data
Abbreviated title(COLD 2014)
CountryItaly
CityRiva del Garda
Period20/10/14 → …

Fingerprint

Redundancy
Ontology
Data structures
Information systems

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Wu, H., Villazon-Terrazas, B., Pan, J. Z., & Gomez-Perez, J. M. (2014). How redundant is it? An empirical analysis on linked datasets. In O. Hartig, A. Hogan, & J. Sequeda (Eds.), COLD 2014, Consuming Linked Data: Proceedings of the 5th International Workshop on Consuming Linked Data (CEUR Workshop Proceedings; Vol. 1264). CEUR-WS.

How redundant is it? An empirical analysis on linked datasets. / Wu, Honghan; Villazon-Terrazas, Boris; Pan, Jeff Z.; Gomez-Perez, Jose Manuel.

COLD 2014, Consuming Linked Data: Proceedings of the 5th International Workshop on Consuming Linked Data. ed. / Olaf Hartig; Aidan Hogan; Juan Sequeda. CEUR-WS, 2014. (CEUR Workshop Proceedings; Vol. 1264).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wu, H, Villazon-Terrazas, B, Pan, JZ & Gomez-Perez, JM 2014, How redundant is it? An empirical analysis on linked datasets. in O Hartig, A Hogan & J Sequeda (eds), COLD 2014, Consuming Linked Data: Proceedings of the 5th International Workshop on Consuming Linked Data. CEUR Workshop Proceedings, vol. 1264, CEUR-WS, 5th International Workshop on Consuming Linked Data, Riva del Garda, Italy, 20/10/14.
Wu H, Villazon-Terrazas B, Pan JZ, Gomez-Perez JM. How redundant is it? An empirical analysis on linked datasets. In Hartig O, Hogan A, Sequeda J, editors, COLD 2014, Consuming Linked Data: Proceedings of the 5th International Workshop on Consuming Linked Data. CEUR-WS. 2014. (CEUR Workshop Proceedings).
Wu, Honghan ; Villazon-Terrazas, Boris ; Pan, Jeff Z. ; Gomez-Perez, Jose Manuel. / How redundant is it? An empirical analysis on linked datasets. COLD 2014, Consuming Linked Data: Proceedings of the 5th International Workshop on Consuming Linked Data. editor / Olaf Hartig ; Aidan Hogan ; Juan Sequeda. CEUR-WS, 2014. (CEUR Workshop Proceedings).
@inproceedings{5ede529ff6744fd2bae5f977f078397c,
title = "How redundant is it?: An empirical analysis on linked datasets",
abstract = "Data redundancy resides in most, if not all, information systems. Linked Data is no exception. Existing approaches try to avoid data redundancies by proposing compression techniques or succinct data structures. However, data redundancies in Linked Data are useful sometimes, e.g., ontology based data access can make use of A-Box redundancies to avoid unnecessary query rewritings. Either you want to avoid it or make use of it, a good understanding about data redundancies will facilitate your task, e.g., identify the exact redundant parts which could be utilised or choose most effective techniques to compress a particular dataset. Unfortunately, little effort has been put on making the data redundancy explicit to data users. In this paper, we introduce a systematic categorisation for Linked Data redundancy, and propose a graph pattern based approach for efficient analysis. Analysis results on representative datasets lead to a main conclusion, that is redundant-aware techniques are demanded.",
author = "Honghan Wu and Boris Villazon-Terrazas and Pan, {Jeff Z.} and Gomez-Perez, {Jose Manuel}",
year = "2014",
month = "10",
day = "7",
language = "English",
series = "CEUR Workshop Proceedings",
publisher = "CEUR-WS",
editor = "Olaf Hartig and Aidan Hogan and Juan Sequeda",
booktitle = "COLD 2014, Consuming Linked Data",

}

TY - GEN

T1 - How redundant is it?

T2 - An empirical analysis on linked datasets

AU - Wu, Honghan

AU - Villazon-Terrazas, Boris

AU - Pan, Jeff Z.

AU - Gomez-Perez, Jose Manuel

PY - 2014/10/7

Y1 - 2014/10/7

N2 - Data redundancy resides in most, if not all, information systems. Linked Data is no exception. Existing approaches try to avoid data redundancies by proposing compression techniques or succinct data structures. However, data redundancies in Linked Data are useful sometimes, e.g., ontology based data access can make use of A-Box redundancies to avoid unnecessary query rewritings. Either you want to avoid it or make use of it, a good understanding about data redundancies will facilitate your task, e.g., identify the exact redundant parts which could be utilised or choose most effective techniques to compress a particular dataset. Unfortunately, little effort has been put on making the data redundancy explicit to data users. In this paper, we introduce a systematic categorisation for Linked Data redundancy, and propose a graph pattern based approach for efficient analysis. Analysis results on representative datasets lead to a main conclusion, that is redundant-aware techniques are demanded.

AB - Data redundancy resides in most, if not all, information systems. Linked Data is no exception. Existing approaches try to avoid data redundancies by proposing compression techniques or succinct data structures. However, data redundancies in Linked Data are useful sometimes, e.g., ontology based data access can make use of A-Box redundancies to avoid unnecessary query rewritings. Either you want to avoid it or make use of it, a good understanding about data redundancies will facilitate your task, e.g., identify the exact redundant parts which could be utilised or choose most effective techniques to compress a particular dataset. Unfortunately, little effort has been put on making the data redundancy explicit to data users. In this paper, we introduce a systematic categorisation for Linked Data redundancy, and propose a graph pattern based approach for efficient analysis. Analysis results on representative datasets lead to a main conclusion, that is redundant-aware techniques are demanded.

UR - http://www.scopus.com/inward/record.url?scp=84908691394&partnerID=8YFLogxK

M3 - Conference contribution

T3 - CEUR Workshop Proceedings

BT - COLD 2014, Consuming Linked Data

A2 - Hartig, Olaf

A2 - Hogan, Aidan

A2 - Sequeda, Juan

PB - CEUR-WS

ER -