Crowdsourcing without a crowd

Reliable online species identification using Bayesian models to minimize crowd size

Advaith Siddharthan, Christopher Lambin, Anne-Marie Robinson, Nirwan Sharma, Richard Comont, Elaine O'Mahony, Chris Mellish, Rene Van Der Wal

Research output: Contribution to journalArticle

13 Citations (Scopus)
7 Downloads (Pure)

Abstract

We present an incremental Bayesian model which resolves key issues of crowd size and data quality for consensus labelling. We evaluate our method using data collected from a real world citizen science program, BEEWATCH, which invites members of the public in the UK to classify (label) photographs of bumblebees
as one of 22 possible species. The biological recording domain poses two key and hitherto unaddressed challenges for consensus models of crowdsourcing: (a) the large number of potential species makes classification difficult and (b) this is compounded by limited crowd availability, stemming from both the inherent difficulty of the task and the lack of relevant skills among the general public. We demonstrate that consensus labels can be reliably found in such circumstances with very small crowd sizes of around 3–5 users (i.e. through group sourcing). Our incremental Bayesian model, which minimizes crowd size by re-evaluating the quality of the consensus label following each species identification solicited from the crowd, is competitive with a Bayesian approach that uses a larger but fixed crowd size and outperforms majority voting. These results have important ecological applicability: biological recording programs such as BEEWATCH can sustain themselves when resources such as taxonomic experts to confirm identifications by photo submitters are scarce (as is typically the case), and feedback can be provided to submitters in a timely fashion. More generally,
our model provides benefits to any crowdsourced consensus labeling task where there is a cost (financial or otherwise) associated with soliciting a label.
Original languageEnglish
Article number45
Pages (from-to)1-20
Number of pages20
JournalACM Transactions on Intelligent Systems and Technology
Volume7
Issue number4
Early online date1 May 2016
DOIs
Publication statusPublished - 14 Jul 2016

Fingerprint

Bayesian Model
Labels
Identification (control systems)
Minimise
Labeling
Majority Voting
Data Quality
Bayesian Approach
Availability
Resolve
Feedback
Classify
Resources
Evaluate
Costs
Model
Demonstrate

Keywords

  • crowdsourcing
  • citizen science
  • consensus model
  • Bayesian reasoning
  • bumblebee identification
  • biological recording

Cite this

Crowdsourcing without a crowd : Reliable online species identification using Bayesian models to minimize crowd size. / Siddharthan, Advaith; Lambin, Christopher; Robinson, Anne-Marie; Sharma, Nirwan; Comont, Richard; O'Mahony, Elaine; Mellish, Chris; Van Der Wal, Rene.

In: ACM Transactions on Intelligent Systems and Technology, Vol. 7, No. 4, 45, 14.07.2016, p. 1-20.

Research output: Contribution to journalArticle

Siddharthan, Advaith ; Lambin, Christopher ; Robinson, Anne-Marie ; Sharma, Nirwan ; Comont, Richard ; O'Mahony, Elaine ; Mellish, Chris ; Van Der Wal, Rene. / Crowdsourcing without a crowd : Reliable online species identification using Bayesian models to minimize crowd size. In: ACM Transactions on Intelligent Systems and Technology. 2016 ; Vol. 7, No. 4. pp. 1-20.
@article{5d7ecf584e654c89b09ff40f6224dff3,
title = "Crowdsourcing without a crowd: Reliable online species identification using Bayesian models to minimize crowd size",
abstract = "We present an incremental Bayesian model which resolves key issues of crowd size and data quality for consensus labelling. We evaluate our method using data collected from a real world citizen science program, BEEWATCH, which invites members of the public in the UK to classify (label) photographs of bumblebeesas one of 22 possible species. The biological recording domain poses two key and hitherto unaddressed challenges for consensus models of crowdsourcing: (a) the large number of potential species makes classification difficult and (b) this is compounded by limited crowd availability, stemming from both the inherent difficulty of the task and the lack of relevant skills among the general public. We demonstrate that consensus labels can be reliably found in such circumstances with very small crowd sizes of around 3–5 users (i.e. through group sourcing). Our incremental Bayesian model, which minimizes crowd size by re-evaluating the quality of the consensus label following each species identification solicited from the crowd, is competitive with a Bayesian approach that uses a larger but fixed crowd size and outperforms majority voting. These results have important ecological applicability: biological recording programs such as BEEWATCH can sustain themselves when resources such as taxonomic experts to confirm identifications by photo submitters are scarce (as is typically the case), and feedback can be provided to submitters in a timely fashion. More generally,our model provides benefits to any crowdsourced consensus labeling task where there is a cost (financial or otherwise) associated with soliciting a label.",
keywords = "crowdsourcing, citizen science, consensus model, Bayesian reasoning, bumblebee identification, biological recording",
author = "Advaith Siddharthan and Christopher Lambin and Anne-Marie Robinson and Nirwan Sharma and Richard Comont and Elaine O'Mahony and Chris Mellish and {Van Der Wal}, Rene",
note = "Acknowledgment This research was supported by an award made by the RCUK Digital Economy program to the University of Aberdeen’s dot.rural Digital Economy Hub (ref. EP/G066051/1). Christopher Lambin was funded through a NERC research experience placements grant.",
year = "2016",
month = "7",
day = "14",
doi = "10.1145/2776896",
language = "English",
volume = "7",
pages = "1--20",
journal = "ACM Transactions on Intelligent Systems and Technology",
issn = "2157-6904",
publisher = "Association for Computing Machinery (ACM)",
number = "4",

}

TY - JOUR

T1 - Crowdsourcing without a crowd

T2 - Reliable online species identification using Bayesian models to minimize crowd size

AU - Siddharthan, Advaith

AU - Lambin, Christopher

AU - Robinson, Anne-Marie

AU - Sharma, Nirwan

AU - Comont, Richard

AU - O'Mahony, Elaine

AU - Mellish, Chris

AU - Van Der Wal, Rene

N1 - Acknowledgment This research was supported by an award made by the RCUK Digital Economy program to the University of Aberdeen’s dot.rural Digital Economy Hub (ref. EP/G066051/1). Christopher Lambin was funded through a NERC research experience placements grant.

PY - 2016/7/14

Y1 - 2016/7/14

N2 - We present an incremental Bayesian model which resolves key issues of crowd size and data quality for consensus labelling. We evaluate our method using data collected from a real world citizen science program, BEEWATCH, which invites members of the public in the UK to classify (label) photographs of bumblebeesas one of 22 possible species. The biological recording domain poses two key and hitherto unaddressed challenges for consensus models of crowdsourcing: (a) the large number of potential species makes classification difficult and (b) this is compounded by limited crowd availability, stemming from both the inherent difficulty of the task and the lack of relevant skills among the general public. We demonstrate that consensus labels can be reliably found in such circumstances with very small crowd sizes of around 3–5 users (i.e. through group sourcing). Our incremental Bayesian model, which minimizes crowd size by re-evaluating the quality of the consensus label following each species identification solicited from the crowd, is competitive with a Bayesian approach that uses a larger but fixed crowd size and outperforms majority voting. These results have important ecological applicability: biological recording programs such as BEEWATCH can sustain themselves when resources such as taxonomic experts to confirm identifications by photo submitters are scarce (as is typically the case), and feedback can be provided to submitters in a timely fashion. More generally,our model provides benefits to any crowdsourced consensus labeling task where there is a cost (financial or otherwise) associated with soliciting a label.

AB - We present an incremental Bayesian model which resolves key issues of crowd size and data quality for consensus labelling. We evaluate our method using data collected from a real world citizen science program, BEEWATCH, which invites members of the public in the UK to classify (label) photographs of bumblebeesas one of 22 possible species. The biological recording domain poses two key and hitherto unaddressed challenges for consensus models of crowdsourcing: (a) the large number of potential species makes classification difficult and (b) this is compounded by limited crowd availability, stemming from both the inherent difficulty of the task and the lack of relevant skills among the general public. We demonstrate that consensus labels can be reliably found in such circumstances with very small crowd sizes of around 3–5 users (i.e. through group sourcing). Our incremental Bayesian model, which minimizes crowd size by re-evaluating the quality of the consensus label following each species identification solicited from the crowd, is competitive with a Bayesian approach that uses a larger but fixed crowd size and outperforms majority voting. These results have important ecological applicability: biological recording programs such as BEEWATCH can sustain themselves when resources such as taxonomic experts to confirm identifications by photo submitters are scarce (as is typically the case), and feedback can be provided to submitters in a timely fashion. More generally,our model provides benefits to any crowdsourced consensus labeling task where there is a cost (financial or otherwise) associated with soliciting a label.

KW - crowdsourcing

KW - citizen science

KW - consensus model

KW - Bayesian reasoning

KW - bumblebee identification

KW - biological recording

U2 - 10.1145/2776896

DO - 10.1145/2776896

M3 - Article

VL - 7

SP - 1

EP - 20

JO - ACM Transactions on Intelligent Systems and Technology

JF - ACM Transactions on Intelligent Systems and Technology

SN - 2157-6904

IS - 4

M1 - 45

ER -