Reinforcement Learning Journal, vol. 6, 2025, pp. 2720–2736.
Presented at the Reinforcement Learning Conference (RLC), Edmonton, Alberta, Canada, August 5–9, 2025.
We propose KL-regularized policy optimization for adaptive submodular maximization, which is a framework for decision making under uncertainty with submodular rewards. Policy optimization of adaptive submodular functions justifies a surprisingly simple and efficient policy gradient update, where the optimized action only affects its immediate reward but not the future ones. It also allows us to learn adaptive submodular policies with large action spaces, such as those represented by large language models (LLMs). We prove that our policies monotonically improve as the regularization diminishes and converge to the optimal greedy policy. Our experiments show major gains in statistical efficiency, in both synthetic problems and LLMs.
Branislav Kveton, Anup Rao, Viet Dac Lai, Nikos Vlassis, and David Arbour. "Adaptive Submodular Policy Optimization." Reinforcement Learning Journal, vol. 6, 2025, pp. 2720–2736.
BibTeX:@article{kveton2025adaptive,
title={Adaptive Submodular Policy Optimization},
author={Kveton, Branislav and Rao, Anup and Lai, Viet Dac and Vlassis, Nikos and Arbour, David},
journal={Reinforcement Learning Journal},
volume={6},
pages={2720--2736},
year={2025}
}