Predicate invention based RDF data compression

Man Zhu*, Weixin Wu, Jeff Z. Pan, Jingyu Han, Pengfei Huang, Qian Liu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

RDF is a data representation format for schema-free structured information that is gaining speed in the context of semantic web, life science, and vice versa. With the continuing proliferation of structured data, demand for RDF compression is becoming increasingly important. In this study, we introduce a novel lossless compression technique for RDF datasets (triples), called PIC (Predicate Invention based Compression). By generating informative predicates and constructing effective mapping to original predicates, PIC only needs to store dramatically reduced number of triples with the newly created predicates, and restoring the original triples efficiently using the mapping. These predicates are automatically generated by a decomposable forward-backward procedure, which consequently supports very fast parallel bit computation. As a semantic compression method for structured data, besides the reduction of syntactic verbosity and data redundancy, we also invoke semantics in the RDF datasets. Experiments on various datasets show competitive results in terms of compression ratio.

Original languageEnglish
Title of host publicationSemantic Technology - 8th Joint International Conference, JIST 2018, Proceedings
EditorsR Ichise, F Lecue, T Kawamura, D Zhao, S Muggleton, K Kozaki
PublisherSpringer Verlag
Pages153-161
Number of pages9
ISBN (Electronic)9783030042844
ISBN (Print)9783030042837
DOIs
Publication statusPublished - 14 Nov 2018
Event8th Joint International Semantic Technology Conference, JIST 2018 - Awaji, Japan
Duration: 26 Nov 201828 Nov 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11341 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference8th Joint International Semantic Technology Conference, JIST 2018
CountryJapan
CityAwaji
Period26/11/1828/11/18

Fingerprint

Data compression
Data Compression
Patents and inventions
Predicate
Semantics
Compression
Syntactics
Semantic Web
Redundancy
Lossless Compression
Life sciences
Experiments
Decomposable
Proliferation
Schema
Experiment

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Zhu, M., Wu, W., Pan, J. Z., Han, J., Huang, P., & Liu, Q. (2018). Predicate invention based RDF data compression. In R. Ichise, F. Lecue, T. Kawamura, D. Zhao, S. Muggleton, & K. Kozaki (Eds.), Semantic Technology - 8th Joint International Conference, JIST 2018, Proceedings (pp. 153-161). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11341 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-030-04284-4_11

Predicate invention based RDF data compression. / Zhu, Man; Wu, Weixin; Pan, Jeff Z.; Han, Jingyu; Huang, Pengfei; Liu, Qian.

Semantic Technology - 8th Joint International Conference, JIST 2018, Proceedings. ed. / R Ichise; F Lecue; T Kawamura; D Zhao; S Muggleton; K Kozaki. Springer Verlag, 2018. p. 153-161 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11341 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhu, M, Wu, W, Pan, JZ, Han, J, Huang, P & Liu, Q 2018, Predicate invention based RDF data compression. in R Ichise, F Lecue, T Kawamura, D Zhao, S Muggleton & K Kozaki (eds), Semantic Technology - 8th Joint International Conference, JIST 2018, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11341 LNCS, Springer Verlag, pp. 153-161, 8th Joint International Semantic Technology Conference, JIST 2018, Awaji, Japan, 26/11/18. https://doi.org/10.1007/978-3-030-04284-4_11
Zhu M, Wu W, Pan JZ, Han J, Huang P, Liu Q. Predicate invention based RDF data compression. In Ichise R, Lecue F, Kawamura T, Zhao D, Muggleton S, Kozaki K, editors, Semantic Technology - 8th Joint International Conference, JIST 2018, Proceedings. Springer Verlag. 2018. p. 153-161. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-04284-4_11
Zhu, Man ; Wu, Weixin ; Pan, Jeff Z. ; Han, Jingyu ; Huang, Pengfei ; Liu, Qian. / Predicate invention based RDF data compression. Semantic Technology - 8th Joint International Conference, JIST 2018, Proceedings. editor / R Ichise ; F Lecue ; T Kawamura ; D Zhao ; S Muggleton ; K Kozaki. Springer Verlag, 2018. pp. 153-161 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{56bc51d8b730480ab1571099809d7f37,
title = "Predicate invention based RDF data compression",
abstract = "RDF is a data representation format for schema-free structured information that is gaining speed in the context of semantic web, life science, and vice versa. With the continuing proliferation of structured data, demand for RDF compression is becoming increasingly important. In this study, we introduce a novel lossless compression technique for RDF datasets (triples), called PIC (Predicate Invention based Compression). By generating informative predicates and constructing effective mapping to original predicates, PIC only needs to store dramatically reduced number of triples with the newly created predicates, and restoring the original triples efficiently using the mapping. These predicates are automatically generated by a decomposable forward-backward procedure, which consequently supports very fast parallel bit computation. As a semantic compression method for structured data, besides the reduction of syntactic verbosity and data redundancy, we also invoke semantics in the RDF datasets. Experiments on various datasets show competitive results in terms of compression ratio.",
author = "Man Zhu and Weixin Wu and Pan, {Jeff Z.} and Jingyu Han and Pengfei Huang and Qian Liu",
year = "2018",
month = "11",
day = "14",
doi = "10.1007/978-3-030-04284-4_11",
language = "English",
isbn = "9783030042837",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "153--161",
editor = "R Ichise and F Lecue and T Kawamura and D Zhao and S Muggleton and K Kozaki",
booktitle = "Semantic Technology - 8th Joint International Conference, JIST 2018, Proceedings",
address = "Germany",

}

TY - GEN

T1 - Predicate invention based RDF data compression

AU - Zhu, Man

AU - Wu, Weixin

AU - Pan, Jeff Z.

AU - Han, Jingyu

AU - Huang, Pengfei

AU - Liu, Qian

PY - 2018/11/14

Y1 - 2018/11/14

N2 - RDF is a data representation format for schema-free structured information that is gaining speed in the context of semantic web, life science, and vice versa. With the continuing proliferation of structured data, demand for RDF compression is becoming increasingly important. In this study, we introduce a novel lossless compression technique for RDF datasets (triples), called PIC (Predicate Invention based Compression). By generating informative predicates and constructing effective mapping to original predicates, PIC only needs to store dramatically reduced number of triples with the newly created predicates, and restoring the original triples efficiently using the mapping. These predicates are automatically generated by a decomposable forward-backward procedure, which consequently supports very fast parallel bit computation. As a semantic compression method for structured data, besides the reduction of syntactic verbosity and data redundancy, we also invoke semantics in the RDF datasets. Experiments on various datasets show competitive results in terms of compression ratio.

AB - RDF is a data representation format for schema-free structured information that is gaining speed in the context of semantic web, life science, and vice versa. With the continuing proliferation of structured data, demand for RDF compression is becoming increasingly important. In this study, we introduce a novel lossless compression technique for RDF datasets (triples), called PIC (Predicate Invention based Compression). By generating informative predicates and constructing effective mapping to original predicates, PIC only needs to store dramatically reduced number of triples with the newly created predicates, and restoring the original triples efficiently using the mapping. These predicates are automatically generated by a decomposable forward-backward procedure, which consequently supports very fast parallel bit computation. As a semantic compression method for structured data, besides the reduction of syntactic verbosity and data redundancy, we also invoke semantics in the RDF datasets. Experiments on various datasets show competitive results in terms of compression ratio.

UR - http://www.scopus.com/inward/record.url?scp=85057287484&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-04284-4_11

DO - 10.1007/978-3-030-04284-4_11

M3 - Conference contribution

SN - 9783030042837

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 153

EP - 161

BT - Semantic Technology - 8th Joint International Conference, JIST 2018, Proceedings

A2 - Ichise, R

A2 - Lecue, F

A2 - Kawamura, T

A2 - Zhao, D

A2 - Muggleton, S

A2 - Kozaki, K

PB - Springer Verlag

ER -