The Confusing Instance Principle for Online Linear Quadratic Control

By Waris Radji, and Odalric-Ambrym Maillard

Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.

Presented at the Reinforcement Learning Conference (RLC), Edmonton, Alberta, Canada, August 5–9, 2025.


Download:

Abstract:

We revisit the problem of controlling linear systems with quadratic cost under unknown dynamics with model-based reinforcement learning. Traditional methods like Optimism in the Face of Uncertainty and Thompson Sampling, rooted in multi-armed bandits (MABs), face practical limitations. In contrast, we propose an alternative based on the *Confusing Instance* (CI) principle, which underpins regret lower bounds in MABs and discrete Markov Decision Processes (MDPs) and is central to the *Minimum Empirical Divergence* (MED) family of algorithms, known for their asymptotic optimality in various settings. By leveraging the structure of LQR policies along with sensitivity and stability analysis, we develop `MED-LQ`. This novel control strategy extends the principles of CI and MED beyond small-scale settings. Our benchmarks on a comprehensive control suite demonstrate that `MED-LQ` achieves competitive performance in various scenarios while highlighting its potential for broader applications in large-scale MDPs.


Citation Information:

Waris Radji and Odalric-Ambrym Maillard. "The Confusing Instance Principle for Online Linear Quadratic Control." Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.

BibTeX:
@article{radji2025confusing,
    title={The Confusing Instance Principle for Online Linear Quadratic Control},
    author={Radji, Waris and Maillard, Odalric-Ambrym},
    journal={Reinforcement Learning Journal},
    year={2025}
}