Boosting Soft Q-Learning by Bounding

By Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, and Rahul V Kulkarni

Reinforcement Learning Journal, vol. 5, 2024, pp. 2373–2399.

Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.


Download:

Abstract:

An agent’s ability to leverage past experience is critical for efficiently solving new tasks. Prior work has focused on using value function estimates to obtain zero-shot approximations for solutions to a new task. In soft $Q$-learning, we show how any value function estimate can also be used to derive double-sided bounds on the optimal value function. The derived bounds lead to new approaches for boosting training performance which we validate experimentally. Notably, we find that the proposed framework suggests an alternative method for updating the $Q$-function, leading to boosted performance.


Citation Information:

Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, and Rahul V Kulkarni. "Boosting Soft Q-Learning by Bounding." Reinforcement Learning Journal, vol. 5, 2024, pp. 2373–2399.

BibTeX:

@article{adamczyk2024boosting,
    title={Boosting Soft Q-Learning by Bounding},
    author={Adamczyk, Jacob and Makarenko, Volodymyr and Tiomkin, Stas and Kulkarni, Rahul V},
    journal={Reinforcement Learning Journal},
    volume={5},
    pages={2373--2399},
    year={2024}
}