Graph Neural Thompson Sampling

By Shuang Wu, and Arash A. Amini

Reinforcement Learning Journal, vol. 1, 2024, pp. 29–63.

Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.


Download:

Abstract:

We consider an online decision-making problem with a reward function defined over graph-structured data. We formally formulate the problem as an instance of graph action bandit. We then propose GNN-TS, a Graph Neural Network (GNN) powered Thompson Sampling (TS) algorithm which employs a GNN approximator for estimating the mean reward function and the graph neural tangent features for uncertainty estimation. We prove that, under certain boundness assumptions on the reward function, GNN-TS achieves a state-of-the-art regret bound which is (1) sub-linear of order $\tilde{\mathcal{O}}(( \tilde{d} T)^{1/2})$ in the number of interaction rounds, $T$, and a notion of effective dimension $\tilde d$, and (2) independent of the number of graph nodes. Empirical results validate that our proposed GNN-TS exhibits competitive performance and scales well on graph action bandit problems.


Citation Information:

Shuang Wu and Arash A Amini. "Graph Neural Thompson Sampling." Reinforcement Learning Journal, vol. 1, 2024, pp. 29–63.

BibTeX:

@article{wu2024graph,
    title={Graph Neural {Thompson} Sampling},
    author={Wu, Shuang and Amini, Arash A.},
    journal={Reinforcement Learning Journal},
    volume={1},
    pages={29--63},
    year={2024}
}