Reinforcement Learning Journal, vol. 5, 2024, pp. 2400–2439.
Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.
We consider a multi-armed bandit problem specified by a set of one-dimensional exponential family distributions endowed with a multimodal structure. The multimodal structure naturally extends the unimodal structure and appears to be underlying in quite interesting ways popular structures such as linear or Lipschitz bandits. We introduce IMED-MB, an algorithm that optimally exploits the multimodal structure, by adapting to this setting the popular Indexed Minimum Empirical Divergence (IMED) algorithm. We provide instance-dependent regret analysis of this strategy. Numerical experiments show that \IMEDMB performs well in practice when assuming unimodal, polynomial or Lipschitz mean function.
Hassan SABER and Odalric-Ambrym Maillard. "Bandits with Multimodal Structure." Reinforcement Learning Journal, vol. 5, 2024, pp. 2400–2439.
BibTeX:@article{saber2024bandits,
title={Bandits with Multimodal Structure},
author={SABER, Hassan and Maillard, Odalric-Ambrym},
journal={Reinforcement Learning Journal},
volume={5},
pages={2400--2439},
year={2024}
}