Exploring Uncertainty in Distributional Reinforcement Learning

By Georgy Antonov, and Peter Dayan

Reinforcement Learning Journal, vol. 2, 2024, pp. 961–978.

Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.


Download:

Abstract:

Epistemic uncertainty, which stems from what a learning algorithm does not know, is the natural signal for exploration. Capturing and exploiting epistemic uncertainty for efficient exploration is conceptually straightforward for model-based methods. However, it is computationally ruinous, prompting a search for model-free approaches. One of the most seminal and venerable such is Bayesian Q-learning, which maintains and updates an approximation to the distribution of the long run returns associated with state-action pairs. However, this approximation can be rather severe. Recent work on distributional reinforcement learning (DRL) provides many powerful methods for modelling return distributions which offer the prospect of improving upon Bayesian Q-learning's parametric scheme, but have not been fully investigated for their exploratory potential. Here, we examine the characteristics of a number of DRL algorithms in the context of exploration and propose a novel Bayesian analogue of the categorical temporal-difference algorithm. We show that this works well, converging appropriately to a close approximation to the true return distribution.


Citation Information:

Georgy Antonov and Peter Dayan. "Exploring Uncertainty in Distributional Reinforcement Learning." Reinforcement Learning Journal, vol. 2, 2024, pp. 961–978.

BibTeX:

@article{antonov2024exploring,
    title={Exploring Uncertainty in Distributional Reinforcement Learning},
    author={Antonov, Georgy and Dayan, Peter},
    journal={Reinforcement Learning Journal},
    volume={2},
    pages={961--978},
    year={2024}
}