Bandits with Multimodal Structure

By Hassan SABER, and Odalric-Ambrym Maillard

Reinforcement Learning Journal, vol. 5, 2024, pp. 2400–2439.

Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.


Download:

Abstract:

We consider a multi-armed bandit problem specified by a set of one-dimensional exponential family distributions endowed with a multimodal structure. The multimodal structure naturally extends the unimodal structure and appears to be underlying in quite interesting ways popular structures such as linear or Lipschitz bandits. We introduce IMED-MB, an algorithm that optimally exploits the multimodal structure, by adapting to this setting the popular Indexed Minimum Empirical Divergence (IMED) algorithm. We provide instance-dependent regret analysis of this strategy. Numerical experiments show that \IMEDMB performs well in practice when assuming unimodal, polynomial or Lipschitz mean function.


Citation Information:

Hassan SABER and Odalric-Ambrym Maillard. "Bandits with Multimodal Structure." Reinforcement Learning Journal, vol. 5, 2024, pp. 2400–2439.

BibTeX:

@article{saber2024bandits,
    title={Bandits with Multimodal Structure},
    author={SABER, Hassan and Maillard, Odalric-Ambrym},
    journal={Reinforcement Learning Journal},
    volume={5},
    pages={2400--2439},
    year={2024}
}