Non-adaptive Online Finetuning for Offline Reinforcement Learning

By Audrey Huang, Mohammad Ghavamzadeh, Nan Jiang, and Marek Petrik

Reinforcement Learning Journal, vol. 1, 2024, pp. 182–197.

Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.


Download:

Abstract:

Offline reinforcement learning (RL) has emerged as an important framework for applying RL to real-life applications. However, the complete lack of online interactions causes technical difficulties. The online finetuning setting which incorporates a limited form of online interactions, often available in practice, has been developed to address these challenges. Unfortunately, existing theoretical frameworks for online finetuning either assume high online sample complexity or require deploying fully adaptive algorithms (i.e., unlimited policy changes), which restrict their application to real-world settings where online interactions and policy updates are expensive and limited. In this paper, we develop a new theoretical framework for online finetuning. Instead of competing with the optimal policy (which inherits the high sample complexity and adaptivity requirements of online RL), we aim to learn a policy that improves as much as possible over an existing reference policy using a pre-specified number of online samples and a non-adaptive data-collection strategy. Our formulation reveals surprising nuances and suggests novel principles that distinguish finetuning from purely online and offline RL.


Citation Information:

Audrey Huang, Mohammad Ghavamzadeh, Nan Jiang, and Marek Petrik. "Non-adaptive Online Finetuning for Offline Reinforcement Learning." Reinforcement Learning Journal, vol. 1, 2024, pp. 182–197.

BibTeX:

@article{huang2024adaptive,
    title={Non-adaptive Online Finetuning for Offline Reinforcement Learning},
    author={Huang, Audrey and Ghavamzadeh, Mohammad and Jiang, Nan and Petrik, Marek},
    journal={Reinforcement Learning Journal},
    volume={1},
    pages={182--197},
    year={2024}
}