Reinforcement Learning Journal, vol. 4, 2024, pp. 1911–1923.
Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.
Imitation Learning from Observation (ILfO) is a setting in which a learner tries to imitate the behavior of an expert, using only observational data and without the direct guidance of demonstrated actions. In this paper, we re-examine optimal transport for IL, in which a reward is generated based on the Wasserstein distance between the state trajectories of the learner and expert. We show that existing methods can be simplified to generate a reward function without requiring learned models or adversarial learning. Unlike many other state-of-the-art methods, our approach can be integrated with any RL algorithm and is amenable to ILfO. We demonstrate the effectiveness of this simple approach on a variety of continuous control tasks and find that it surpasses the state of the art in the IlfO setting, achieving expert-level performance across a range of evaluation domains even when observing only a single expert trajectory without actions.
Wei-Di Chang, Scott Fujimoto, David Meger, and Gregory Dudek. "Imitation Learning from Observation through Optimal Transport." Reinforcement Learning Journal, vol. 4, 2024, pp. 1911–1923.
BibTeX:@article{chang2024imitation,
title={Imitation Learning from Observation through Optimal Transport},
author={Chang, Wei-Di and Fujimoto, Scott and Meger, David and Dudek, Gregory},
journal={Reinforcement Learning Journal},
volume={4},
pages={1911--1923},
year={2024}
}