Parallel and Streaming Truth Discovery in Large-Scale Quantitative Crowdsourcing

Robin Wentao Ouyang; Lance Kaplan; Alice Toniolo; Mani Srivastava; Timothy J. Norman

doi:10.1109/TPDS.2016.2515092

Parallel and Streaming Truth Discovery in Large-Scale Quantitative Crowdsourcing

Robin Wentao Ouyang, Lance Kaplan, Alice Toniolo, Mani Srivastava, Timothy J. Norman

Computing Science

Research output: Contribution to journal › Article › peer-review

23 Citations (Scopus)

72 Downloads (Pure)

Abstract

To enable reliable crowdsourcing applications, it is of great importance to develop algorithms that can automatically discover the truths from possibly noisy and conflicting claims provided by various information sources. In order to handle crowdsourcing applications involving big or streaming data, a desirable truth discovery algorithm should not only be effective, but also be scalable. However, with respect to quantitative crowdsourcing applications such as object counting and percentage annotation, existing truth discovery algorithms are not simultaneously effective and scalable. They either address truth discovery in categorical crowdsourcing or perform batch processing that does not scale. In this paper, we propose new parallel and streaming truth discovery algorithms for quantitative crowdsourcing applications. Through extensive experiments on real-world and synthetic datasets, we demonstrate that 1) both of them are quite effective, 2) the parallel algorithm can efficiently perform truth discovery on large datasets, and 3) the streaming algorithm processes data incrementally, and can efficiently perform truth discovery both on large datasets and in data streams.

Original language	English
Pages (from-to)	2984-2997
Number of pages	14
Journal	IEEE Transactions on Parallel and Distributed Systems
Volume	27
Issue number	10
Early online date	6 Jan 2016
DOIs	https://doi.org/10.1109/TPDS.2016.2515092
Publication status	Published - 1 Oct 2016

Bibliographical note

ACKNOWLEDGMENTS
This research is based upon work supported in part by the U.S. ARL and U.K. Ministry of Defense under Agreement Number W911NF-06-3-0001, and by the NSF under award CNS-1213140. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views or represent the official policies of the NSF, the U.S. ARL, the U.S. Government, the U.K. Ministry of Defense or the U.K. Government. The U.S. and U.K. Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. This work was done when R. W. Ouyang was a postdoc at the University of California, Los Angeles, CA.

Keywords

Crowdsourcing
truth discovery
quantitative task
big data
parallel algorithm
streaming algorithm

Access to Document

10.1109/TPDS.2016.2515092Licence: Unspecified

Accepted Manuscript
© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating newcollective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Final published version of this work available at DOI: 10.1109/TPDS.2016.2515092
Accepted author manuscript, 859 KBLicence: Unspecified

Cite this

@article{80c4f9782a674d9281a6308917ec9c12,

title = "Parallel and Streaming Truth Discovery in Large-Scale Quantitative Crowdsourcing",

abstract = "To enable reliable crowdsourcing applications, it is of great importance to develop algorithms that can automatically discover the truths from possibly noisy and conflicting claims provided by various information sources. In order to handle crowdsourcing applications involving big or streaming data, a desirable truth discovery algorithm should not only be effective, but also be scalable. However, with respect to quantitative crowdsourcing applications such as object counting and percentage annotation, existing truth discovery algorithms are not simultaneously effective and scalable. They either address truth discovery in categorical crowdsourcing or perform batch processing that does not scale. In this paper, we propose new parallel and streaming truth discovery algorithms for quantitative crowdsourcing applications. Through extensive experiments on real-world and synthetic datasets, we demonstrate that 1) both of them are quite effective, 2) the parallel algorithm can efficiently perform truth discovery on large datasets, and 3) the streaming algorithm processes data incrementally, and can efficiently perform truth discovery both on large datasets and in data streams.",

keywords = "Crowdsourcing, truth discovery, quantitative task, big data, parallel algorithm, streaming algorithm",

author = "Ouyang, {Robin Wentao} and Lance Kaplan and Alice Toniolo and Mani Srivastava and Norman, {Timothy J.}",

note = "ACKNOWLEDGMENTS This research is based upon work supported in part by the U.S. ARL and U.K. Ministry of Defense under Agreement Number W911NF-06-3-0001, and by the NSF under award CNS-1213140. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views or represent the official policies of the NSF, the U.S. ARL, the U.S. Government, the U.K. Ministry of Defense or the U.K. Government. The U.S. and U.K. Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. This work was done when R. W. Ouyang was a postdoc at the University of California, Los Angeles, CA.",

year = "2016",

month = oct,

day = "1",

doi = "10.1109/TPDS.2016.2515092",

language = "English",

volume = "27",

pages = "2984--2997",

journal = "IEEE Transactions on Parallel and Distributed Systems",

issn = "1045-9219",

publisher = "IEEE Computer Society",

number = "10",

}

TY - JOUR

T1 - Parallel and Streaming Truth Discovery in Large-Scale Quantitative Crowdsourcing

AU - Ouyang, Robin Wentao

AU - Kaplan, Lance

AU - Toniolo, Alice

AU - Srivastava, Mani

AU - Norman, Timothy J.

N1 - ACKNOWLEDGMENTS This research is based upon work supported in part by the U.S. ARL and U.K. Ministry of Defense under Agreement Number W911NF-06-3-0001, and by the NSF under award CNS-1213140. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views or represent the official policies of the NSF, the U.S. ARL, the U.S. Government, the U.K. Ministry of Defense or the U.K. Government. The U.S. and U.K. Governments are authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. This work was done when R. W. Ouyang was a postdoc at the University of California, Los Angeles, CA.

PY - 2016/10/1

Y1 - 2016/10/1

N2 - To enable reliable crowdsourcing applications, it is of great importance to develop algorithms that can automatically discover the truths from possibly noisy and conflicting claims provided by various information sources. In order to handle crowdsourcing applications involving big or streaming data, a desirable truth discovery algorithm should not only be effective, but also be scalable. However, with respect to quantitative crowdsourcing applications such as object counting and percentage annotation, existing truth discovery algorithms are not simultaneously effective and scalable. They either address truth discovery in categorical crowdsourcing or perform batch processing that does not scale. In this paper, we propose new parallel and streaming truth discovery algorithms for quantitative crowdsourcing applications. Through extensive experiments on real-world and synthetic datasets, we demonstrate that 1) both of them are quite effective, 2) the parallel algorithm can efficiently perform truth discovery on large datasets, and 3) the streaming algorithm processes data incrementally, and can efficiently perform truth discovery both on large datasets and in data streams.

AB - To enable reliable crowdsourcing applications, it is of great importance to develop algorithms that can automatically discover the truths from possibly noisy and conflicting claims provided by various information sources. In order to handle crowdsourcing applications involving big or streaming data, a desirable truth discovery algorithm should not only be effective, but also be scalable. However, with respect to quantitative crowdsourcing applications such as object counting and percentage annotation, existing truth discovery algorithms are not simultaneously effective and scalable. They either address truth discovery in categorical crowdsourcing or perform batch processing that does not scale. In this paper, we propose new parallel and streaming truth discovery algorithms for quantitative crowdsourcing applications. Through extensive experiments on real-world and synthetic datasets, we demonstrate that 1) both of them are quite effective, 2) the parallel algorithm can efficiently perform truth discovery on large datasets, and 3) the streaming algorithm processes data incrementally, and can efficiently perform truth discovery both on large datasets and in data streams.

KW - Crowdsourcing

KW - truth discovery

KW - quantitative task

KW - big data

KW - parallel algorithm

KW - streaming algorithm

U2 - 10.1109/TPDS.2016.2515092

DO - 10.1109/TPDS.2016.2515092

M3 - Article

SN - 1045-9219

VL - 27

SP - 2984

EP - 2997

JO - IEEE Transactions on Parallel and Distributed Systems

JF - IEEE Transactions on Parallel and Distributed Systems

IS - 10

ER -

Parallel and Streaming Truth Discovery in Large-Scale Quantitative Crowdsourcing

Abstract

Bibliographical note

Keywords

Access to Document

Fingerprint

Cite this