RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$

By Abhinav Bhatia, Samer B. Nashed, and Shlomo Zilberstein

Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.

Presented at the Reinforcement Learning Conference (RLC), Edmonton, Alberta, Canada, August 5–9, 2025.


Download:

Abstract:

Meta reinforcement learning (Meta-RL) methods such as RL$^2$ have emerged as promising approaches for learning data-efficient RL algorithms tailored to a given task distribution. However, they show poor asymptotic performance and struggle with out-of-distribution tasks because they rely on sequence models, such as recurrent neural networks or transformers, to process experiences rather than summarize them using general-purpose RL components such as value functions. In contrast, traditional RL algorithms are data-inefficient as they do not use domain knowledge, but do converge to an optimal policy in the limit. We propose RL$^3$, a principled hybrid approach that incorporates action-values, learned per task via traditional RL, in the inputs to Meta-RL. We show that RL$^3$ earns a greater cumulative reward in the long term compared to RL$^2$ while drastically reducing meta-training time and generalizes better to out-of-distribution tasks. Experiments are conducted on Meta-RL benchmarks and custom discrete domains that exhibit a range of short-term, long-term, and complex dependencies.


Citation Information:

Abhinav Bhatia, Samer B Nashed, and Shlomo Zilberstein. "RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$." Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.

BibTeX:
@article{bhatia2025boosting,
    title={{RL}$^3$: {B}oosting Meta Reinforcement Learning via {RL} inside {RL}$^2$},
    author={Bhatia, Abhinav and Nashed, Samer B. and Zilberstein, Shlomo},
    journal={Reinforcement Learning Journal},
    year={2025}
}