Reinforcement Learning Journal, vol. 5, 2024, pp. 2071–2095.
Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.
Long-horizon tasks, which have a large discount factor, pose a challenge for most conventional reinforcement learning (RL) algorithms. Algorithms such as Value Iteration and Temporal Difference (TD) learning have a slow convergence rate and become inefficient in these tasks. When the transition distributions are given, PID~VI was recently introduced to accelerate the convergence of Value Iteration using ideas from control theory. Inspired by this, we introduce PID TD Learning and PID Q-Learning algorithms for the RL setting, in which only samples from the environment are available. We give a theoretical analysis of the convergence of PID TD Learning and its acceleration compared to the conventional TD Learning. We also introduce a method for adapting PID gains in the presence of noise and empirically verify its effectiveness.
Mark Bedaywi, Amin Rakhsha, and Amir-massoud Farahmand. "PID Accelerated Temporal Difference Algorithms." Reinforcement Learning Journal, vol. 5, 2024, pp. 2071–2095.
BibTeX:@article{bedaywi2024accelerated,
title={{PID} Accelerated Temporal Difference Algorithms},
author={Bedaywi, Mark and Rakhsha, Amin and Farahmand, Amir-massoud},
journal={Reinforcement Learning Journal},
volume={5},
pages={2071--2095},
year={2024}
}