Abstract
Reinforcement learning (RL) is a technique to compute an optimal policy in stochastic settings whereby, actions from an initial policy are simulated (or directly executed) and the value of a state is updated based on the immediate rewards obtained as the policy is executed. Existing efforts model opponents in
competitive games as elements of a stochastic environment and use RL to learn policies against such opponents. In this setting, the rate of change for state values monotonically decreases over time, as learning converges. Although this modeling assumes that the opponent strategy is static over time, such an assumption is too strong when human opponents are possible. Consequently, in
this paper, we develop a meta-level RL mechanism that detects when an opponent changes strategy and allows the state-values to “deconverge” in order to learn how to play against a different strategy. We validate this approach empirically for high-level strategy selection in the Starcraft: Brood War game.
competitive games as elements of a stochastic environment and use RL to learn policies against such opponents. In this setting, the rate of change for state values monotonically decreases over time, as learning converges. Although this modeling assumes that the opponent strategy is static over time, such an assumption is too strong when human opponents are possible. Consequently, in
this paper, we develop a meta-level RL mechanism that detects when an opponent changes strategy and allows the state-values to “deconverge” in order to learn how to play against a different strategy. We validate this approach empirically for high-level strategy selection in the Starcraft: Brood War game.
Original language | English |
---|---|
Title of host publication | Proceedings of the SBGames conference on Computing |
Pages | 17-24 |
Number of pages | 8 |
Publication status | Published - 2013 |