Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.
Presented at the Reinforcement Learning Conference (RLC), Edmonton, Alberta, Canada, August 5–9, 2025.
In the context of Markov Decision Processes (MDPs) with linear Bellman completeness, a generalization of linear MDPs, we reconsider the learning capabilities of a *greedy* algorithm. The motivation is that, when exploration is costly or dangerous, an exploration-free approach may be preferable to optimistic or randomized solutions. We show that, under a condition of sufficient diversity in the feature distribution, Least-Squares Value Iteration (LSVI) can achieve sublinear regret. Specifically, we show that the expected cumulative regret is at most $O(H^3\sqrt{dK/\lambda_0})$, where $K$ is the number of episodes, $H$ is the task horizon, $d$ is the dimension of the feature map and $\lambda_0$ is a measure of feature diversity. We empirically validate our theoretical findings on synthetic linear MDPs. Our analysis is a first step towards exploration-free reinforcement learning in MDPs with large state spaces.
Civitavecchia Luca and Matteo Papini. "Exploration-Free Reinforcement Learning with Linear Function Approximation." Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.
BibTeX:@article{luca2025exploration,
title={Exploration-Free Reinforcement Learning with Linear Function Approximation},
author={Luca, Civitavecchia and Papini, Matteo},
journal={Reinforcement Learning Journal},
year={2025}
}