Reinforcement Learning Journal, vol. 1, 2024, pp. 283–301.
Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.
Model-based and model-free reinforcement learning (RL) each possess relative strengths that prevent either algorithm from strictly outperforming the other. Model-based RL often offers greater data efficiency, as it can use models to evaluate many possible behaviors before choosing one to enact. However, because models cannot perfectly represent complex environments, agents that rely too heavily on models may suffer from poor asymptotic performance. Model-free RL, on the other hand, avoids this problem at the expense of data efficiency. In this work, we seek a unified approach to RL that combines the strengths of both approaches. To this end, we introduce the concept of _equivalent policy sets_ (EPS), which quantify the limitations of models for the purposes of decision-making, _i.e._, action selection. Based on this concept, we propose _Unified RL_, a novel RL algorithm that uses models to constrain model-free RL to the set of policies that are not provably suboptimal, according to model-based bounds on policy performance. We demonstrate across a range of benchmarks that Unified RL effectively combines the relative strengths of both model-based and model-free RL, in that it achieves comparable data efficiency to model-based RL, while achieving asymptotic performance similar or superior to that of model-free RL. Additionally, we show that Unified RL often outperforms a number of existing state-of-the-art model-based and model-free RL algorithms, and _can learn effective policies in situations where either model-based or model-free RL alone fail_.
Benjamin Freed, Thomas Wei, Roberto Calandra, Jeff Schneider, and Howie Choset. "Unifying Model-Based and Model-Free Reinforcement Learning with Equivalent Policy Sets." Reinforcement Learning Journal, vol. 1, 2024, pp. 283–301.
BibTeX:@article{freed2024unifying,
title={Unifying Model-Based and Model-Free Reinforcement Learning with Equivalent Policy Sets},
author={Freed, Benjamin and Wei, Thomas and Calandra, Roberto and Schneider, Jeff and Choset, Howie},
journal={Reinforcement Learning Journal},
volume={1},
pages={283--301},
year={2024}
}