Offline Reinforcement Learning from Datasets with Structured Non-Stationarity

By Johannes Ackermann, Takayuki Osa, and Masashi Sugiyama

Reinforcement Learning Journal, vol. 5, 2024, pp. 2140–2161.

Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.


Download:

Abstract:

Current Reinforcement Learning (RL) is often limited by the large amount of data needed to learn a successful policy. Offline RL aims to solve this issue by using transitions collected by a different behavior policy. We address a novel Offline RL problem setting in which, while collecting the dataset, the transition and reward functions gradually change between episodes but stay constant within each episode. We propose a method based on Contrastive Predictive Coding that identifies this non-stationarity in the offline dataset, accounts for it when training a policy, and predicts it during evaluation. We analyze our proposed method and show that it performs well in simple continuous control tasks and challenging, high-dimensional locomotion tasks. We show that our method often achieves the oracle performance and performs better than baselines.


Citation Information:

Johannes Ackermann, Takayuki Osa, and Masashi Sugiyama. "Offline Reinforcement Learning from Datasets with Structured Non-Stationarity." Reinforcement Learning Journal, vol. 5, 2024, pp. 2140–2161.

BibTeX:

@article{ackermann2024offline,
    title={Offline Reinforcement Learning from Datasets with Structured Non-Stationarity},
    author={Ackermann, Johannes and Osa, Takayuki and Sugiyama, Masashi},
    journal={Reinforcement Learning Journal},
    volume={5},
    pages={2140--2161},
    year={2024}
}