How redundant is it?: An empirical analysis on linked datasets

Honghan Wu; Boris Villazon-Terrazas; Jeff Z. Pan; Jose Manuel Gomez-Perez

How redundant is it? An empirical analysis on linked datasets

Honghan Wu^*, Boris Villazon-Terrazas, Jeff Z. Pan, Jose Manuel Gomez-Perez

^*Corresponding author for this work

Computing Science

ISOCO, Intelligent Software Components S.A

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

3 Citations (Scopus)

10 Downloads (Pure)

Abstract

Data redundancy resides in most, if not all, information systems. Linked Data is no exception. Existing approaches try to avoid data redundancies by proposing compression techniques or succinct data structures. However, data redundancies in Linked Data are useful sometimes, e.g., ontology based data access can make use of A-Box redundancies to avoid unnecessary query rewritings. Either you want to avoid it or make use of it, a good understanding about data redundancies will facilitate your task, e.g., identify the exact redundant parts which could be utilised or choose most effective techniques to compress a particular dataset. Unfortunately, little effort has been put on making the data redundancy explicit to data users. In this paper, we introduce a systematic categorisation for Linked Data redundancy, and propose a graph pattern based approach for efficient analysis. Analysis results on representative datasets lead to a main conclusion, that is redundant-aware techniques are demanded.

Original language	English
Title of host publication	Proceedings of the 5th International Workshop on Consuming Linked Data (COLD 2014) co-located with the 13th International Semantic Web Conference (ISWC 2014) Riva del Garda, Italy, October 20, 2014.
Editors	Olaf Hartig, Aidan Hogan, Juan Sequeda
Publisher	CEUR-WS
Number of pages	12
Publication status	Published - 7 Oct 2014
Event	5th International Workshop on Consuming Linked Data - Riva del Garda, Italy Duration: 20 Oct 2014 → 20 Oct 2014

Publication series

Name	CEUR Workshop Proceedings
Publisher	CEUR-WS
Volume	1264
ISSN (Electronic)	1613-0073

Workshop

Workshop	5th International Workshop on Consuming Linked Data
Abbreviated title	(COLD 2014)
Country/Territory	Italy
City	Riva del Garda
Period	20/10/14 → 20/10/14

Access to Document

How redundant is itFinal published version, 921 KBLicence: Other

https://ceur-ws.org/Vol-1264/cold2014_WuVPG.pdfLicence: Other

Cite this

Wu, H., Villazon-Terrazas, B., Pan, J. Z., & Gomez-Perez, J. M. (2014). How redundant is it? An empirical analysis on linked datasets. In O. Hartig, A. Hogan, & J. Sequeda (Eds.), Proceedings of the 5th International Workshop on Consuming Linked Data (COLD 2014) co-located with the 13th International Semantic Web Conference (ISWC 2014) Riva del Garda, Italy, October 20, 2014. (CEUR Workshop Proceedings; Vol. 1264). CEUR-WS. https://ceur-ws.org/Vol-1264/cold2014_WuVPG.pdf

How redundant is it? An empirical analysis on linked datasets. / Wu, Honghan; Villazon-Terrazas, Boris; Pan, Jeff Z. et al.
Proceedings of the 5th International Workshop on Consuming Linked Data (COLD 2014) co-located with the 13th International Semantic Web Conference (ISWC 2014) Riva del Garda, Italy, October 20, 2014.. ed. / Olaf Hartig; Aidan Hogan; Juan Sequeda. CEUR-WS, 2014. (CEUR Workshop Proceedings; Vol. 1264).

Research output: Chapter in Book/Report/Conference proceeding › Published conference contribution

Wu, H, Villazon-Terrazas, B, Pan, JZ & Gomez-Perez, JM 2014, How redundant is it? An empirical analysis on linked datasets. in O Hartig, A Hogan & J Sequeda (eds), Proceedings of the 5th International Workshop on Consuming Linked Data (COLD 2014) co-located with the 13th International Semantic Web Conference (ISWC 2014) Riva del Garda, Italy, October 20, 2014.. CEUR Workshop Proceedings, vol. 1264, CEUR-WS, 5th International Workshop on Consuming Linked Data, Riva del Garda, Italy, 20/10/14. <https://ceur-ws.org/Vol-1264/cold2014_WuVPG.pdf>

Wu H, Villazon-Terrazas B, Pan JZ, Gomez-Perez JM. How redundant is it? An empirical analysis on linked datasets. In Hartig O, Hogan A, Sequeda J, editors, Proceedings of the 5th International Workshop on Consuming Linked Data (COLD 2014) co-located with the 13th International Semantic Web Conference (ISWC 2014) Riva del Garda, Italy, October 20, 2014.. CEUR-WS. 2014. (CEUR Workshop Proceedings).

Wu, Honghan ; Villazon-Terrazas, Boris ; Pan, Jeff Z. et al. / How redundant is it? An empirical analysis on linked datasets. Proceedings of the 5th International Workshop on Consuming Linked Data (COLD 2014) co-located with the 13th International Semantic Web Conference (ISWC 2014) Riva del Garda, Italy, October 20, 2014.. editor / Olaf Hartig ; Aidan Hogan ; Juan Sequeda. CEUR-WS, 2014. (CEUR Workshop Proceedings).

@inproceedings{5ede529ff6744fd2bae5f977f078397c,

title = "How redundant is it?: An empirical analysis on linked datasets",

abstract = "Data redundancy resides in most, if not all, information systems. Linked Data is no exception. Existing approaches try to avoid data redundancies by proposing compression techniques or succinct data structures. However, data redundancies in Linked Data are useful sometimes, e.g., ontology based data access can make use of A-Box redundancies to avoid unnecessary query rewritings. Either you want to avoid it or make use of it, a good understanding about data redundancies will facilitate your task, e.g., identify the exact redundant parts which could be utilised or choose most effective techniques to compress a particular dataset. Unfortunately, little effort has been put on making the data redundancy explicit to data users. In this paper, we introduce a systematic categorisation for Linked Data redundancy, and propose a graph pattern based approach for efficient analysis. Analysis results on representative datasets lead to a main conclusion, that is redundant-aware techniques are demanded.",

author = "Honghan Wu and Boris Villazon-Terrazas and Pan, {Jeff Z.} and Gomez-Perez, {Jose Manuel}",

year = "2014",

month = oct,

day = "7",

language = "English",

series = "CEUR Workshop Proceedings",

publisher = "CEUR-WS",

editor = "Olaf Hartig and Aidan Hogan and Juan Sequeda",

booktitle = "Proceedings of the 5th International Workshop on Consuming Linked Data (COLD 2014) co-located with the 13th International Semantic Web Conference (ISWC 2014) Riva del Garda, Italy, October 20, 2014.",

note = "5th International Workshop on Consuming Linked Data, (COLD 2014) ; Conference date: 20-10-2014 Through 20-10-2014",

}

TY - GEN

T1 - How redundant is it?

T2 - 5th International Workshop on Consuming Linked Data

AU - Wu, Honghan

AU - Villazon-Terrazas, Boris

AU - Pan, Jeff Z.

AU - Gomez-Perez, Jose Manuel

PY - 2014/10/7

Y1 - 2014/10/7

N2 - Data redundancy resides in most, if not all, information systems. Linked Data is no exception. Existing approaches try to avoid data redundancies by proposing compression techniques or succinct data structures. However, data redundancies in Linked Data are useful sometimes, e.g., ontology based data access can make use of A-Box redundancies to avoid unnecessary query rewritings. Either you want to avoid it or make use of it, a good understanding about data redundancies will facilitate your task, e.g., identify the exact redundant parts which could be utilised or choose most effective techniques to compress a particular dataset. Unfortunately, little effort has been put on making the data redundancy explicit to data users. In this paper, we introduce a systematic categorisation for Linked Data redundancy, and propose a graph pattern based approach for efficient analysis. Analysis results on representative datasets lead to a main conclusion, that is redundant-aware techniques are demanded.

AB - Data redundancy resides in most, if not all, information systems. Linked Data is no exception. Existing approaches try to avoid data redundancies by proposing compression techniques or succinct data structures. However, data redundancies in Linked Data are useful sometimes, e.g., ontology based data access can make use of A-Box redundancies to avoid unnecessary query rewritings. Either you want to avoid it or make use of it, a good understanding about data redundancies will facilitate your task, e.g., identify the exact redundant parts which could be utilised or choose most effective techniques to compress a particular dataset. Unfortunately, little effort has been put on making the data redundancy explicit to data users. In this paper, we introduce a systematic categorisation for Linked Data redundancy, and propose a graph pattern based approach for efficient analysis. Analysis results on representative datasets lead to a main conclusion, that is redundant-aware techniques are demanded.

UR - http://www.scopus.com/inward/record.url?scp=84908691394&partnerID=8YFLogxK

UR - https://www.researchgate.net/publication/289653017_How_redundant_is_it-An_empirical_analysis_on_linked_datasets

M3 - Published conference contribution

AN - SCOPUS:84908691394

T3 - CEUR Workshop Proceedings

BT - Proceedings of the 5th International Workshop on Consuming Linked Data (COLD 2014) co-located with the 13th International Semantic Web Conference (ISWC 2014) Riva del Garda, Italy, October 20, 2014.

A2 - Hartig, Olaf

A2 - Hogan, Aidan

A2 - Sequeda, Juan

PB - CEUR-WS

Y2 - 20 October 2014 through 20 October 2014

ER -