Epistemically-guided forward-backward exploration

By Núria Armengol Urpí, Marin Vlastelica, Georg Martius, and Stelian Coros

Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.

Presented at the Reinforcement Learning Conference (RLC), Edmonton, Alberta, Canada, August 5–9, 2025.


Download:

Abstract:

Zero-shot reinforcement learning is necessary for extracting optimal policies in absence of concrete rewards for fast adaptation to future problem settings. Forward-backward representations ($FB$) have emerged as a promising method for learning optimal policies in absence of rewards via a factorization of the policy occupancy measure. However, up until now, $FB$ and many similar zero-shot reinforcement learning algorithms have been decoupled from the exploration problem, generally relying on other exploration algorithms for data collection. We argue that $FB$ representations should fundamentally be used for exploration in order to learn more efficiently. With this goal in mind, we design exploration policies that arise naturally from the $FB$ representation that minimize the posterior variance of the $FB$ representation, hence minimizing its epistemic uncertainty. We empirically demonstrate that such principled exploration strategies improve sample complexity of the $FB$ algorithm considerably in comparison to other exploration methods.


Citation Information:

Núria Armengol Urpí, Marin Vlastelica, Georg Martius, and Stelian Coros. "Epistemically-guided forward-backward exploration." Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.

BibTeX:
@article{urpi2025epistemically,
    title={Epistemically-guided forward-backward exploration},
    author={Urp{\'{i}}, N{\'{u}}ria Armengol and Vlastelica, Marin and Martius, Georg and Coros, Stelian},
    journal={Reinforcement Learning Journal},
    year={2025}
}