TY - GEN
T1 - Predicate invention based RDF data compression
AU - Zhu, Man
AU - Wu, Weixin
AU - Pan, Jeff Z.
AU - Han, Jingyu
AU - Huang, Pengfei
AU - Liu, Qian
PY - 2018/11/14
Y1 - 2018/11/14
N2 - RDF is a data representation format for schema-free structured information that is gaining speed in the context of semantic web, life science, and vice versa. With the continuing proliferation of structured data, demand for RDF compression is becoming increasingly important. In this study, we introduce a novel lossless compression technique for RDF datasets (triples), called PIC (Predicate Invention based Compression). By generating informative predicates and constructing effective mapping to original predicates, PIC only needs to store dramatically reduced number of triples with the newly created predicates, and restoring the original triples efficiently using the mapping. These predicates are automatically generated by a decomposable forward-backward procedure, which consequently supports very fast parallel bit computation. As a semantic compression method for structured data, besides the reduction of syntactic verbosity and data redundancy, we also invoke semantics in the RDF datasets. Experiments on various datasets show competitive results in terms of compression ratio.
AB - RDF is a data representation format for schema-free structured information that is gaining speed in the context of semantic web, life science, and vice versa. With the continuing proliferation of structured data, demand for RDF compression is becoming increasingly important. In this study, we introduce a novel lossless compression technique for RDF datasets (triples), called PIC (Predicate Invention based Compression). By generating informative predicates and constructing effective mapping to original predicates, PIC only needs to store dramatically reduced number of triples with the newly created predicates, and restoring the original triples efficiently using the mapping. These predicates are automatically generated by a decomposable forward-backward procedure, which consequently supports very fast parallel bit computation. As a semantic compression method for structured data, besides the reduction of syntactic verbosity and data redundancy, we also invoke semantics in the RDF datasets. Experiments on various datasets show competitive results in terms of compression ratio.
UR - http://www.scopus.com/inward/record.url?scp=85057287484&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-04284-4_11
DO - 10.1007/978-3-030-04284-4_11
M3 - Published conference contribution
AN - SCOPUS:85057287484
SN - 9783030042837
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 153
EP - 161
BT - Semantic Technology - 8th Joint International Conference, JIST 2018, Proceedings
A2 - Ichise, R
A2 - Lecue, F
A2 - Kawamura, T
A2 - Zhao, D
A2 - Muggleton, S
A2 - Kozaki, K
PB - Springer Verlag
T2 - 8th Joint International Semantic Technology Conference, JIST 2018
Y2 - 26 November 2018 through 28 November 2018
ER -