Reinforcement Learning Journal, vol. 2, 2024, pp. 593–605.
Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.
We study the problem of imitation learning via inverse reinforcement learning where the agent attempts to learn an expert's policy from a dataset of collected state, action tuples. We derive a new Robust model-based Offline Imitation Learning method (ROIL) that mitigates covariate shift by avoiding estimating the expert's occupancy frequency. Frequently in offline settings, there is insufficient data to reliably estimate the expert's occupancy frequency and this leads to models that do not generalize well. Our proposed approach, ROIL, is a method that is guaranteed to recover the expert's occupancy frequency and is efficiently solvable as an LP. We demonstrate ROIL's ability to achieve minimal regret in large environments under covariate shift, such as when the state visitation frequency of the demonstrations does not come from the expert.
Gersi Doko, Guang Yang, Daniel S Brown, and Marek Petrik. "ROIL: Robust Offline Imitation Learning without Trajectories." Reinforcement Learning Journal, vol. 2, 2024, pp. 593–605.
BibTeX:@article{doko2024roil,
title={{ROIL}: {R}obust Offline Imitation Learning without Trajectories},
author={Doko, Gersi and Yang, Guang and Brown, Daniel S. and Petrik, Marek},
journal={Reinforcement Learning Journal},
volume={2},
pages={593--605},
year={2024}
}