SwiftTD: A Fast and Robust Algorithm for Temporal Difference Learning

By Khurram Javed, Arsalan Sharifnassab, and Richard S. Sutton

Reinforcement Learning Journal, vol. 1, no. 1, 2024, pp. TBD.

Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.


Download:

Abstract:

Learning to make temporal predictions is a key component of reinforcement learning algorithms. The dominant paradigm for learning predictions from an online stream of data is Temporal Difference (TD) learning. In this work we introduce a new TD algorithm---SwiftTD---that learns more accurate predictions than existing algorithms. SwiftTD combines True Online TD($\lambda$) with per-feature step-size parameters, step-size optimization, a bound on the rate of learning, and step-size decay. Per-feature step-size parameters and step-size optimization improve credit assignment by increasing step-size parameters of important signals and reducing them for irrelevant signals. The bound on the rate of learning prevents overcorrections. Step-size decay reduces step-size parameters if they are too large. We benchmark SwiftTD on the Atari Prediction Benchmark and show that even with linear function approximation it can learn accurate predictions. We further show that SwiftTD can be combined with neural networks to improve their performance. Finally, we show that all three ideas---step-size optimization, the bound on the rate of learning, and step-size decay---contribute to the strong performance of SwiftTD.


Citation Information:

Khurram Javed, Arsalan Sharifnassab, and Richard S Sutton. "SwiftTD: A Fast and Robust Algorithm for Temporal Difference Learning." Reinforcement Learning Journal, vol. 1, no. 1, 2024, pp. TBD.

BibTeX:

Note: Manually check this automatically generated text (particularly capitalization in the title and first-last splits of names).

@article{javed2024swifttd,
    title={{SwiftTD}: {A} Fast and Robust Algorithm for Temporal Difference Learning},
    author={Javed, Khurram and Sharifnassab, Arsalan and Sutton, Richard S.},
    journal={Reinforcement Learning Journal},
    volume={1},
    issue={1},
    year={2024}
}