Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.
Presented at the Reinforcement Learning Conference (RLC), Edmonton, Alberta, Canada, August 5–9, 2025.
We propose KL-regularized policy optimization for adaptive submodular maximization. Adaptive submodularity is a framework for decision making under uncertainty with submodular rewards. The benefit of policy optimization is that we can learn controllers for large action spaces that can utilize state-of-the-art large language model (LLM) priors. The benefit of submodularity are more efficient policy gradient updates because the gradient associated with an action only affects its immediate gain. When the reward model is correctly specified, we prove that our policies monotonically improve as the regularization diminishes and converge to the optimal greedy policy. Our experiments show major gains in statistical efficiency, in both synthetic problems and LLMs.
Branislav Kveton, Anup Rao, Viet Dac Lai, Nikos Vlassis, and David Arbour. "Adaptive Submodular Policy Optimization." Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.
BibTeX:@article{kveton2025adaptive,
title={Adaptive Submodular Policy Optimization},
author={Kveton, Branislav and Rao, Anup and Lai, Viet Dac and Vlassis, Nikos and Arbour, David},
journal={Reinforcement Learning Journal},
year={2025}
}