Reinforcement Learning with Adaptive Temporal Discounting

By Sahaj Singh Maini, and Zoran Tiganj

Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.

Presented at the Reinforcement Learning Conference (RLC), Edmonton, Alberta, Canada, August 5–9, 2025.


Download:

Abstract:

Conventional reinforcement learning (RL) methods often fix a single discount factor for future rewards, limiting their ability to handle diverse temporal requirements. We propose a framework that utilizes an interpretation of the value function as a Laplace transform. By training an agent across a spectrum of discount factors and applying an inverse transform, we recover a log-compressed representation of expected future reward. This representation enables post hoc adjustments to the discount function (e.g., exponential, hyperbolic, or finite horizon) without retraining. Furthermore, by precomputing a library of policies, the agent can dynamically select the policy that maximizes a newly specified discount objective at runtime, effectively constructing a hybrid policy to handle varying temporal objectives. The properties of this log-compressed timeline are consistent with human temporal perception as described by the Weber-Fechner law, theoretically enhancing efficiency in scale-free environments by maintaining uniform relative precision across timescales. We demonstrate this framework in a grid-world navigation task where the agent adapts to different time horizons.


Citation Information:

Sahaj Singh Maini and Zoran Tiganj. "Reinforcement Learning with Adaptive Temporal Discounting." Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.

BibTeX:
@article{maini2025reinforcement,
    title={Reinforcement Learning with Adaptive Temporal Discounting},
    author={Maini, Sahaj Singh and Tiganj, Zoran},
    journal={Reinforcement Learning Journal},
    year={2025}
}