Zero-Shot Reinforcement Learning Under Partial Observability

By Scott Jeen, Tom Bewley, and Jonathan Cullen

Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.

Presented at the Reinforcement Learning Conference (RLC), Edmonton, Alberta, Canada, August 5–9, 2025.


Download:

Abstract:

Recent work has shown that, under certain assumptions, zero-shot reinforcement learning (RL) methods can generalise to *any* unseen task in an environment after an offline, reward-free pre-training phase. Access to Markov states is one such assumption, yet, in many real-world applications, the Markov state is often only *partially observable*. Here, we explore how the performance of standard zero-shot RL methods degrades when subjected to partially observability, and show that, as in single-task RL, memory-based architectures are an effective remedy. We evaluate our *memory-based* zero-shot RL methods in domains where the states, rewards and a change in dynamics are partially observed, and show improved performance over memory-free baselines. Our code is open-sourced via the project page: https://enjeeneer.io/projects/bfms-with-memory/.


Citation Information:

Scott Jeen, Tom Bewley, and Jonathan Cullen. "Zero-Shot Reinforcement Learning Under Partial Observability." Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.

BibTeX:
@article{jeen2025zero,
    title={Zero-Shot Reinforcement Learning Under Partial Observability},
    author={Jeen, Scott and Bewley, Tom and Cullen, Jonathan},
    journal={Reinforcement Learning Journal},
    year={2025}
}