Abstract
We present an incremental Bayesian model which resolves key issues of crowd size and data quality for consensus labelling. We evaluate our method using data collected from a real world citizen science program, BEEWATCH, which invites members of the public in the UK to classify (label) photographs of bumblebees
as one of 22 possible species. The biological recording domain poses two key and hitherto unaddressed challenges for consensus models of crowdsourcing: (a) the large number of potential species makes classification difficult and (b) this is compounded by limited crowd availability, stemming from both the inherent difficulty of the task and the lack of relevant skills among the general public. We demonstrate that consensus labels can be reliably found in such circumstances with very small crowd sizes of around 3–5 users (i.e. through group sourcing). Our incremental Bayesian model, which minimizes crowd size by re-evaluating the quality of the consensus label following each species identification solicited from the crowd, is competitive with a Bayesian approach that uses a larger but fixed crowd size and outperforms majority voting. These results have important ecological applicability: biological recording programs such as BEEWATCH can sustain themselves when resources such as taxonomic experts to confirm identifications by photo submitters are scarce (as is typically the case), and feedback can be provided to submitters in a timely fashion. More generally,
our model provides benefits to any crowdsourced consensus labeling task where there is a cost (financial or otherwise) associated with soliciting a label.
as one of 22 possible species. The biological recording domain poses two key and hitherto unaddressed challenges for consensus models of crowdsourcing: (a) the large number of potential species makes classification difficult and (b) this is compounded by limited crowd availability, stemming from both the inherent difficulty of the task and the lack of relevant skills among the general public. We demonstrate that consensus labels can be reliably found in such circumstances with very small crowd sizes of around 3–5 users (i.e. through group sourcing). Our incremental Bayesian model, which minimizes crowd size by re-evaluating the quality of the consensus label following each species identification solicited from the crowd, is competitive with a Bayesian approach that uses a larger but fixed crowd size and outperforms majority voting. These results have important ecological applicability: biological recording programs such as BEEWATCH can sustain themselves when resources such as taxonomic experts to confirm identifications by photo submitters are scarce (as is typically the case), and feedback can be provided to submitters in a timely fashion. More generally,
our model provides benefits to any crowdsourced consensus labeling task where there is a cost (financial or otherwise) associated with soliciting a label.
Original language | English |
---|---|
Article number | 45 |
Pages (from-to) | 1-20 |
Number of pages | 20 |
Journal | ACM Transactions on Intelligent Systems and Technology |
Volume | 7 |
Issue number | 4 |
Early online date | 1 May 2016 |
DOIs | |
Publication status | Published - 14 Jul 2016 |
Keywords
- crowdsourcing
- citizen science
- consensus model
- Bayesian reasoning
- bumblebee identification
- biological recording