Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.
Presented at the Reinforcement Learning Conference (RLC), Edmonton, Alberta, Canada, August 5–9, 2025.
The goal of this paper is to present a finite-time analysis of minimax Q-learning and its smooth variant for two-player zero-sum Markov games, where the smooth variant is derived by using the Boltzmann operator. To the best of the authors' knowledge, this is the first work in the literature to provide such results. To facilitate the analysis, we introduce lower and upper comparison systems and employ switching system models. The proposed approach can not only offer a simpler and more intuitive framework for analyzing convergence but also provide deeper insights into the behavior of minimax Q-learning and its smooth variant. These novel perspectives have the potential to reveal new relationships and foster synergy between ideas in control theory and reinforcement learning.
Narim Jeong and Donghwan Lee. "Finite-Time Analysis of Minimax Q-Learning." Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.
BibTeX:@article{jeong2025finite,
title={Finite-Time Analysis of Minimax {Q-Learning}},
author={Jeong, Narim and Lee, Donghwan},
journal={Reinforcement Learning Journal},
year={2025}
}