PID Accelerated Temporal Difference Algorithms

By Mark Bedaywi, Amin Rakhsha, and Amir-massoud Farahmand

Reinforcement Learning Journal, vol. 5, 2024, pp. 2071–2095.

Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.


Download:

Abstract:

Long-horizon tasks, which have a large discount factor, pose a challenge for most conventional reinforcement learning (RL) algorithms. Algorithms such as Value Iteration and Temporal Difference (TD) learning have a slow convergence rate and become inefficient in these tasks. When the transition distributions are given, PID~VI was recently introduced to accelerate the convergence of Value Iteration using ideas from control theory. Inspired by this, we introduce PID TD Learning and PID Q-Learning algorithms for the RL setting, in which only samples from the environment are available. We give a theoretical analysis of the convergence of PID TD Learning and its acceleration compared to the conventional TD Learning. We also introduce a method for adapting PID gains in the presence of noise and empirically verify its effectiveness.


Citation Information:

Mark Bedaywi, Amin Rakhsha, and Amir-massoud Farahmand. "PID Accelerated Temporal Difference Algorithms." Reinforcement Learning Journal, vol. 5, 2024, pp. 2071–2095.

BibTeX:

@article{bedaywi2024accelerated,
    title={{PID} Accelerated Temporal Difference Algorithms},
    author={Bedaywi, Mark and Rakhsha, Amin and Farahmand, Amir-massoud},
    journal={Reinforcement Learning Journal},
    volume={5},
    pages={2071--2095},
    year={2024}
}