Foraging decisions as multi-armed bandit problems: Applying reinforcement learning algorithms to foraging data

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)


Finding resources is crucial for animals to survive and reproduce, but the understanding of the decision-making underlying foraging decisions to explore new resources and exploit old resources remains lacking. Theory predicts an ‘exploration-exploitation trade-off’ where animals must balance their effort into either stay and exploit a seemingly good resource or move and explore the environment. To date, however, it has been challenging to generate flexible yet tractable statistical models that can capture this trade-off, and our understanding of foraging decisions is limited. Here, I suggest that foraging decisions can be seen as multi-armed bandit problems, and apply deterministic (i.e., the Upper-Confidence-Bound or ‘UCB’) and Bayesian algorithms (i.e., Thompson Sampling or ‘TS’) to demonstrate how these algorithms generate testable a priori predictions from simulated data. Next, I use UCB and TS to analyse empirical foraging data from the tephritid fruit fly larvae Bactrocera tryoni to provide a qualitative and quantitative framework to quantify animal foraging behaviour. Qualitative analysis revealed that TS display shorter exploration period than UCB, although both converged to similar qualitative results. Quantitative analysis demonstrated that, overall, UCB is more accurate in predicting the observed foraging patterns compared with TS, even though both algorithms failed to quantitatively estimate the empirical foraging patterns in high-density groups (i.e., groups with 50 larvae and, more strikingly, groups with 100 larvae), likely due to the influence of intraspecific competition on animal behaviour. The framework proposed here demonstrates how reinforcement learning algorithms can be used to model animal foraging decisions.

Original languageEnglish
Pages (from-to)48-56
Number of pages9
JournalJournal of Theoretical Biology
Early online date6 Feb 2019
Publication statusPublished - 21 Apr 2019




Dive into the research topics of 'Foraging decisions as multi-armed bandit problems: Applying reinforcement learning algorithms to foraging data'. Together they form a unique fingerprint.

Cite this