Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.
Presented at the Reinforcement Learning Conference (RLC), Edmonton, Alberta, Canada, August 5–9, 2025.
Zero-shot reinforcement learning is necessary for extracting optimal policies in absence of concrete rewards for fast adaptation to future problem settings. Forward-backward representations ($FB$) have emerged as a promising method for learning optimal policies in absence of rewards via a factorization of the policy occupancy measure. However, up until now, $FB$ and many similar zero-shot reinforcement learning algorithms have been decoupled from the exploration problem, generally relying on other exploration algorithms for data collection. We argue that $FB$ representations should fundamentally be used for exploration in order to learn more efficiently. With this goal in mind, we design exploration policies that arise naturally from the $FB$ representation that minimize the posterior variance of the $FB$ representation, hence minimizing its epistemic uncertainty. We empirically demonstrate that such principled exploration strategies improve sample complexity of the $FB$ algorithm considerably in comparison to other exploration methods.
Núria Armengol Urpí, Marin Vlastelica, Georg Martius, and Stelian Coros. "Epistemically-guided forward-backward exploration." Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.
BibTeX:@article{urpi2025epistemically,
title={Epistemically-guided forward-backward exploration},
author={Urp{\'{i}}, N{\'{u}}ria Armengol and Vlastelica, Marin and Martius, Georg and Coros, Stelian},
journal={Reinforcement Learning Journal},
year={2025}
}