Optimal discounting for offline input-driven MDP

By Randy Lefebvre, and Audrey Durand

Reinforcement Learning Journal, vol. 6, 2025, pp. 1318–1333.

Presented at the Reinforcement Learning Conference (RLC), Edmonton, Alberta, Canada, August 5–9, 2025.

Download:

Abstract:

Offline reinforcement learning has gained a lot of popularity for its potential to solve industry challenges. However, real-world environments are often highly stochastic and partially observable, leading long-term planners to overfit to offline data in model-based settings. Input-driven Markov Decision Processes (IDMDPs) offer a way to work with some of the uncertainty by letting designers separate what the agent has control over (states) from what it cannot (inputs) in the environnement. These stochastic external inputs are often difficult to model. Under the assumption that the input model will be imperfect, we investigate the bias-variance tradeoff under shallow planning in IDMDPs. Paving the way to input-driven planning horizons, we also investigate the similarity of optimal planning horizons at different inputs given the structure of the input space.

Citation Information:

Randy Lefebvre and Audrey Durand. "Optimal discounting for offline input-driven MDP." Reinforcement Learning Journal, vol. 6, 2025, pp. 1318–1333.

BibTeX:

@article{lefebvre2025optimal,
    title={Optimal discounting for offline input-driven {MDP}},
    author={Lefebvre, Randy and Durand, Audrey},
    journal={Reinforcement Learning Journal},
    volume={6},
    pages={1318--1333},
    year={2025}
}

Optimal discounting for offline input-driven MDP

By Randy Lefebvre, and Audrey Durand

Download: Paper

Abstract:

Citation Information:

Download: