Rectifying Regression in Reinforcement Learning

By Alex Ayoub, David Szepesvari, Alireza Bakhtiari, Csaba Szepesvari, and Dale Schuurmans

Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.

Presented at the Reinforcement Learning Conference (RLC), Edmonton, Alberta, Canada, August 5–9, 2025.

Download: Note: Paper unavailable until authors provide the signed publication agreement.

Abstract:

This paper investigates the impact of the loss function in value-based methods for reinforcement learning through an analysis of underlying prediction objectives. We theoretically show that mean absolute error is a better prediction objective than the traditional mean squared error for controlling the learned policy's suboptimality gap. Furthermore, we present results that different loss functions are better aligned with these different regression objectives: binary and categorical cross-entropy losses with the mean absolute error and squared loss with the mean squared error. We then provide empirical evidence that algorithms minimizing these cross-entropy losses can outperform those based on the squared loss in linear reinforcement learning.

Citation Information:

Alex Ayoub, David Szepesvari, Alireza Bakhtiari, Csaba Szepesvari, and Dale Schuurmans. "Rectifying Regression in Reinforcement Learning." Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.

BibTeX:

@article{ayoub2025rectifying,
    title={Rectifying Regression in Reinforcement Learning},
    author={Ayoub, Alex and Szepesvari, David and Bakhtiari, Alireza and Szepesvari, Csaba and Schuurmans, Dale},
    journal={Reinforcement Learning Journal},
    year={2025}
}