Distributed stream consistency checking

Shen Gao, Daniele Dell’Aglio, Jeff Z. Pan, Abraham Bernstein

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Dealing with noisy data is one of the big issues in stream processing. While noise has been widely studied in settings where streams have simple schemas, e.g. time series, few solutions focused on streams characterized by complex data structures. This paper studies how to check consistency over large amounts of complex streams. Our proposed methods exploit reasoning to assess if portions of the streams are compliant to a reference conceptual model. To achieve scalability, our methods run on state-of-the-art distributed stream processing platforms, e.g. Apache Storm or Twitter Heron. Our first method computes the closure of Negative Inclusions (NIs) for DL-Lite ontologies and registers the NIs as queries. The second method compiles the ontology into a processing pipeline to evenly distribute the workload. Experiments compares the two methods and show that the second one improves the throughput up to 139% with the LUBM ontology and 330% with the NPD ontology.

Original languageEnglish
Title of host publicationWeb Engineering
Subtitle of host publicationICWE 2018
EditorsT. Mikkonen, R. Klamma, J. Hernández
Place of PublicationCham
PublisherSpringer Verlag
Pages387-403
Number of pages17
ISBN (Electronic)9783319916620
ISBN (Print)9783319916613
DOIs
Publication statusPublished - 2018
Event18th International Conference on Web Engineering, ICWE 2018 - Caceres, Spain
Duration: 5 Jun 20188 Jun 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10845
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th International Conference on Web Engineering, ICWE 2018
CountrySpain
CityCaceres
Period5/06/188/06/18

Fingerprint

Ontology
Stream Processing
Processing
Inclusion
Distributed Processing
Series Solution
Data structures
Scalability
Time series
Reference Model
Noisy Data
Conceptual Model
Complex Structure
Pipelines
Schema
Throughput
Workload
Data Structures
Closure
Reasoning

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Gao, S., Dell’Aglio, D., Pan, J. Z., & Bernstein, A. (2018). Distributed stream consistency checking. In T. Mikkonen, R. Klamma, & J. Hernández (Eds.), Web Engineering: ICWE 2018 (pp. 387-403). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10845). Cham: Springer Verlag. https://doi.org/10.1007/978-3-319-91662-0_32

Distributed stream consistency checking. / Gao, Shen; Dell’Aglio, Daniele; Pan, Jeff Z.; Bernstein, Abraham.

Web Engineering: ICWE 2018. ed. / T. Mikkonen; R. Klamma; J. Hernández. Cham : Springer Verlag, 2018. p. 387-403 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10845).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Gao, S, Dell’Aglio, D, Pan, JZ & Bernstein, A 2018, Distributed stream consistency checking. in T Mikkonen, R Klamma & J Hernández (eds), Web Engineering: ICWE 2018. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10845, Springer Verlag, Cham, pp. 387-403, 18th International Conference on Web Engineering, ICWE 2018, Caceres, Spain, 5/06/18. https://doi.org/10.1007/978-3-319-91662-0_32
Gao S, Dell’Aglio D, Pan JZ, Bernstein A. Distributed stream consistency checking. In Mikkonen T, Klamma R, Hernández J, editors, Web Engineering: ICWE 2018. Cham: Springer Verlag. 2018. p. 387-403. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-91662-0_32
Gao, Shen ; Dell’Aglio, Daniele ; Pan, Jeff Z. ; Bernstein, Abraham. / Distributed stream consistency checking. Web Engineering: ICWE 2018. editor / T. Mikkonen ; R. Klamma ; J. Hernández. Cham : Springer Verlag, 2018. pp. 387-403 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{ccd87a753d4c43b6b89c289e56e71a4c,
title = "Distributed stream consistency checking",
abstract = "Dealing with noisy data is one of the big issues in stream processing. While noise has been widely studied in settings where streams have simple schemas, e.g. time series, few solutions focused on streams characterized by complex data structures. This paper studies how to check consistency over large amounts of complex streams. Our proposed methods exploit reasoning to assess if portions of the streams are compliant to a reference conceptual model. To achieve scalability, our methods run on state-of-the-art distributed stream processing platforms, e.g. Apache Storm or Twitter Heron. Our first method computes the closure of Negative Inclusions (NIs) for DL-Lite ontologies and registers the NIs as queries. The second method compiles the ontology into a processing pipeline to evenly distribute the workload. Experiments compares the two methods and show that the second one improves the throughput up to 139{\%} with the LUBM ontology and 330{\%} with the NPD ontology.",
author = "Shen Gao and Daniele Dell’Aglio and Pan, {Jeff Z.} and Abraham Bernstein",
year = "2018",
doi = "10.1007/978-3-319-91662-0_32",
language = "English",
isbn = "9783319916613",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "387--403",
editor = "T. Mikkonen and R. Klamma and J. Hern{\'a}ndez",
booktitle = "Web Engineering",
address = "Germany",

}

TY - GEN

T1 - Distributed stream consistency checking

AU - Gao, Shen

AU - Dell’Aglio, Daniele

AU - Pan, Jeff Z.

AU - Bernstein, Abraham

PY - 2018

Y1 - 2018

N2 - Dealing with noisy data is one of the big issues in stream processing. While noise has been widely studied in settings where streams have simple schemas, e.g. time series, few solutions focused on streams characterized by complex data structures. This paper studies how to check consistency over large amounts of complex streams. Our proposed methods exploit reasoning to assess if portions of the streams are compliant to a reference conceptual model. To achieve scalability, our methods run on state-of-the-art distributed stream processing platforms, e.g. Apache Storm or Twitter Heron. Our first method computes the closure of Negative Inclusions (NIs) for DL-Lite ontologies and registers the NIs as queries. The second method compiles the ontology into a processing pipeline to evenly distribute the workload. Experiments compares the two methods and show that the second one improves the throughput up to 139% with the LUBM ontology and 330% with the NPD ontology.

AB - Dealing with noisy data is one of the big issues in stream processing. While noise has been widely studied in settings where streams have simple schemas, e.g. time series, few solutions focused on streams characterized by complex data structures. This paper studies how to check consistency over large amounts of complex streams. Our proposed methods exploit reasoning to assess if portions of the streams are compliant to a reference conceptual model. To achieve scalability, our methods run on state-of-the-art distributed stream processing platforms, e.g. Apache Storm or Twitter Heron. Our first method computes the closure of Negative Inclusions (NIs) for DL-Lite ontologies and registers the NIs as queries. The second method compiles the ontology into a processing pipeline to evenly distribute the workload. Experiments compares the two methods and show that the second one improves the throughput up to 139% with the LUBM ontology and 330% with the NPD ontology.

UR - http://www.scopus.com/inward/record.url?scp=85048005932&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-91662-0_32

DO - 10.1007/978-3-319-91662-0_32

M3 - Conference contribution

SN - 9783319916613

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 387

EP - 403

BT - Web Engineering

A2 - Mikkonen, T.

A2 - Klamma, R.

A2 - Hernández, J.

PB - Springer Verlag

CY - Cham

ER -