Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.
Presented at the Reinforcement Learning Conference (RLC), Edmonton, Alberta, Canada, August 5–9, 2025.
Recent work has shown that, under certain assumptions, zero-shot reinforcement learning (RL) methods can generalise to *any* unseen task in an environment after an offline, reward-free pre-training phase. Access to Markov states is one such assumption, yet, in many real-world applications, the Markov state is often only *partially observable*. Here, we explore how the performance of standard zero-shot RL methods degrades when subjected to partially observability, and show that, as in single-task RL, memory-based architectures are an effective remedy. We evaluate our *memory-based* zero-shot RL methods in domains where the states, rewards and a change in dynamics are partially observed, and show improved performance over memory-free baselines. Our code is open-sourced via the project page: https://enjeeneer.io/projects/bfms-with-memory/.
Scott Jeen, Tom Bewley, and Jonathan Cullen. "Zero-Shot Reinforcement Learning Under Partial Observability." Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.
BibTeX:@article{jeen2025zero,
title={Zero-Shot Reinforcement Learning Under Partial Observability},
author={Jeen, Scott and Bewley, Tom and Cullen, Jonathan},
journal={Reinforcement Learning Journal},
year={2025}
}