Reinforcement Learning Journal, vol. 1, 2024, pp. 29–63.
Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.
We consider an online decision-making problem with a reward function defined over graph-structured data. We formally formulate the problem as an instance of graph action bandit. We then propose GNN-TS, a Graph Neural Network (GNN) powered Thompson Sampling (TS) algorithm which employs a GNN approximator for estimating the mean reward function and the graph neural tangent features for uncertainty estimation. We prove that, under certain boundness assumptions on the reward function, GNN-TS achieves a state-of-the-art regret bound which is (1) sub-linear of order $\tilde{\mathcal{O}}(( \tilde{d} T)^{1/2})$ in the number of interaction rounds, $T$, and a notion of effective dimension $\tilde d$, and (2) independent of the number of graph nodes. Empirical results validate that our proposed GNN-TS exhibits competitive performance and scales well on graph action bandit problems.
Shuang Wu and Arash A Amini. "Graph Neural Thompson Sampling." Reinforcement Learning Journal, vol. 1, 2024, pp. 29–63.
BibTeX:@article{wu2024graph,
title={Graph Neural {Thompson} Sampling},
author={Wu, Shuang and Amini, Arash A.},
journal={Reinforcement Learning Journal},
volume={1},
pages={29--63},
year={2024}
}