Reinforcement Learning Journal, vol. 2, 2024, pp. 840–863.
Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.
Learning to make temporal predictions is a key component of reinforcement learning algorithms. The dominant paradigm for learning predictions from an online stream of data is Temporal Difference (TD) learning. In this work we introduce a new TD algorithm---SwiftTD---that learns more accurate predictions than existing algorithms. SwiftTD combines True Online TD($\lambda$) with per-feature step-size parameters, step-size optimization, a bound on the update to the eligibility vector, and step-size decay. Per-feature step-size parameters and step-size optimization improve credit assignment by increasing the step-size parameters of important signals and reducing them for irrelevant signals. The bound on the update to the eligibility vector prevents overcorrections. Step-size decay reduces step-size parameters if they are too large. We benchmark SwiftTD on the Atari Prediction Benchmark and show that even with linear function approximation it can learn accurate predictions. We further show that SwiftTD performs well across a wide range of its hyperparameters. Finally, we show that SwiftTD can be used in the last layer of neural networks to improve their performance.
Khurram Javed, Arsalan Sharifnassab, and Richard S Sutton. "SwiftTD: A Fast and Robust Algorithm for Temporal Difference Learning." Reinforcement Learning Journal, vol. 2, 2024, pp. 840–863.
BibTeX:@article{javed2024swifttd,
title={{SwiftTD}: {A} Fast and Robust Algorithm for Temporal Difference Learning},
author={Javed, Khurram and Sharifnassab, Arsalan and Sutton, Richard S.},
journal={Reinforcement Learning Journal},
volume={2},
pages={840--863},
year={2024}
}