Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms

By Javad Azizi, Thang Duong, Yasin Abbasi-Yadkori, András György, Claire Vernade, and Mohammad Ghavamzadeh

Reinforcement Learning Journal, vol. 5, 2024, pp. 2461–2491.

Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.

Download:

Abstract:

We study a sequential decision problem where the learner faces a sequence of $K$-armed bandit tasks. The task boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting). For a given integer $M\le K$, the learner aims to compete with the best subset of arms of size $M$. We design an algorithm based on a reduction to bandit submodular maximization and show that for $T$ time steps comprised of $N$ tasks, in the regime of large $N$ and small number of optimal arms $M$, its regret in both settings is smaller than the simple baseline of $\tilde{O}(\sqrt{KNT})$ that can be obtained by using standard algorithms designed for non-stationary bandit problems. For the bandit meta-learning problem with fixed task length $\tau$, we show that the regret of the algorithm is bounded as $\tilde{O}(NM\sqrt{M \tau}+ (M^4 K N^2)^{1/3} \tau)$. Under some additional assumptions on the identifiability of the optimal arms in each task, we show a bandit meta-learning algorithm with an improved $\tilde{O}(N\sqrt{M \tau}+ (NMK^{1/2})^{1/2}\tau^{3/4} )$ regret, where the order of the leading term (the first term) is optimal up to logarithmic factors, and the algorithm does not need the knowledge of $M, N$, and $T$ in advance.

Citation Information:

Javad Azizi, Thang Duong, Yasin Abbasi-Yadkori, András György, Claire Vernade, and Mohammad Ghavamzadeh. "Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms." Reinforcement Learning Journal, vol. 5, 2024, pp. 2461–2491.

BibTeX:

@article{azizi2024stationary,
    title={Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms},
    author={Azizi, Javad and Duong, Thang and Abbasi-Yadkori, Yasin and Gy{\"{o}}rgy, Andr{\'{a}}s and Vernade, Claire and Ghavamzadeh, Mohammad},
    journal={Reinforcement Learning Journal},
    volume={5},
    pages={2461--2491},
    year={2024}
}

Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms

By Javad Azizi, Thang Duong, Yasin Abbasi-Yadkori, András György, Claire Vernade, and Mohammad Ghavamzadeh

Download: Paper

Abstract:

Citation Information:

Download: