Diversity-Augmented Intrinsic Motivation for Deep Reinforcement Learning

Tianhong Dai; Yali Du; Meng Fang; Anil Anthony Bharath

doi:10.1016/j.neucom.2021.10.040

Diversity-Augmented Intrinsic Motivation for Deep Reinforcement Learning

Tianhong Dai^* (Corresponding Author), Yali Du, Meng Fang, Anil Anthony Bharath

^*Corresponding author for this work

Research output: Contribution to journal › Article › peer-review

Abstract

In many real-world problems, reward signals received by agents are delayed or sparse, which makes it challenging to train a reinforcement learning (RL) agent. An intrinsic reward signal can help an agent to explore such environments in the quest for novel states. In this work, we propose a general end-to-end diversity-augmented intrinsic motivation for deep reinforcement learning which encourages the agent to explore new states and automatically provides denser rewards. Specifically, we measure the diversity of adjacent states under a model of state sequences based on determinantal point process (DPP); this is coupled with a straight-through gradient estimator to enable end-to-end differentiability. The proposed approach is comprehensively evaluated on the MuJoCo and the Arcade Learning Environments (Atari and SuperMarioBros). The experiments show that an intrinsic reward based on the diversity measure derived from the DPP model accelerates the early stages of training in Atari games and SuperMarioBros. In MuJoCo, the approach improves on prior techniques for tasks using the standard reward setting, and achieves the state-of-the-art performance on 12 out of 15 tasks containing delayed rewards.

Original language	English
Pages (from-to)	396-406
Number of pages	11
Journal	Neurocomputing
Volume	468
Early online date	2 Nov 2021
DOIs	https://doi.org/10.1016/j.neucom.2021.10.040
Publication status	Published - 11 Jan 2022

Keywords

Deep Reinforcement Learning
Curiosity-driven exploration
Determinantal point process

Access to Document

10.1016/j.neucom.2021.10.040Licence: Unspecified

Cite this

@article{a47592363b344b64b8efb7f0018f9e94,

title = "Diversity-Augmented Intrinsic Motivation for Deep Reinforcement Learning",

abstract = "In many real-world problems, reward signals received by agents are delayed or sparse, which makes it challenging to train a reinforcement learning (RL) agent. An intrinsic reward signal can help an agent to explore such environments in the quest for novel states. In this work, we propose a general end-to-end diversity-augmented intrinsic motivation for deep reinforcement learning which encourages the agent to explore new states and automatically provides denser rewards. Specifically, we measure the diversity of adjacent states under a model of state sequences based on determinantal point process (DPP); this is coupled with a straight-through gradient estimator to enable end-to-end differentiability. The proposed approach is comprehensively evaluated on the MuJoCo and the Arcade Learning Environments (Atari and SuperMarioBros). The experiments show that an intrinsic reward based on the diversity measure derived from the DPP model accelerates the early stages of training in Atari games and SuperMarioBros. In MuJoCo, the approach improves on prior techniques for tasks using the standard reward setting, and achieves the state-of-the-art performance on 12 out of 15 tasks containing delayed rewards.",

keywords = "Deep Reinforcement Learning, Curiosity-driven exploration, Determinantal point process",

author = "Tianhong Dai and Yali Du and Meng Fang and Bharath, {Anil Anthony}",

year = "2022",

month = jan,

day = "11",

doi = "10.1016/j.neucom.2021.10.040",

language = "English",

volume = "468",

pages = "396--406",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "ELSEVIER SCIENCE BV",

}

TY - JOUR

T1 - Diversity-Augmented Intrinsic Motivation for Deep Reinforcement Learning

AU - Dai, Tianhong

AU - Du, Yali

AU - Fang, Meng

AU - Bharath, Anil Anthony

PY - 2022/1/11

Y1 - 2022/1/11

N2 - In many real-world problems, reward signals received by agents are delayed or sparse, which makes it challenging to train a reinforcement learning (RL) agent. An intrinsic reward signal can help an agent to explore such environments in the quest for novel states. In this work, we propose a general end-to-end diversity-augmented intrinsic motivation for deep reinforcement learning which encourages the agent to explore new states and automatically provides denser rewards. Specifically, we measure the diversity of adjacent states under a model of state sequences based on determinantal point process (DPP); this is coupled with a straight-through gradient estimator to enable end-to-end differentiability. The proposed approach is comprehensively evaluated on the MuJoCo and the Arcade Learning Environments (Atari and SuperMarioBros). The experiments show that an intrinsic reward based on the diversity measure derived from the DPP model accelerates the early stages of training in Atari games and SuperMarioBros. In MuJoCo, the approach improves on prior techniques for tasks using the standard reward setting, and achieves the state-of-the-art performance on 12 out of 15 tasks containing delayed rewards.

AB - In many real-world problems, reward signals received by agents are delayed or sparse, which makes it challenging to train a reinforcement learning (RL) agent. An intrinsic reward signal can help an agent to explore such environments in the quest for novel states. In this work, we propose a general end-to-end diversity-augmented intrinsic motivation for deep reinforcement learning which encourages the agent to explore new states and automatically provides denser rewards. Specifically, we measure the diversity of adjacent states under a model of state sequences based on determinantal point process (DPP); this is coupled with a straight-through gradient estimator to enable end-to-end differentiability. The proposed approach is comprehensively evaluated on the MuJoCo and the Arcade Learning Environments (Atari and SuperMarioBros). The experiments show that an intrinsic reward based on the diversity measure derived from the DPP model accelerates the early stages of training in Atari games and SuperMarioBros. In MuJoCo, the approach improves on prior techniques for tasks using the standard reward setting, and achieves the state-of-the-art performance on 12 out of 15 tasks containing delayed rewards.

KW - Deep Reinforcement Learning

KW - Curiosity-driven exploration

KW - Determinantal point process

UR - http://dx.doi.org/10.1016/j.neucom.2021.10.040

U2 - 10.1016/j.neucom.2021.10.040

DO - 10.1016/j.neucom.2021.10.040

M3 - Article

SN - 0925-2312

VL - 468

SP - 396

EP - 406

JO - Neurocomputing

JF - Neurocomputing

ER -

Diversity-Augmented Intrinsic Motivation for Deep Reinforcement Learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this