Imitation Learning from Observation through Optimal Transport

By Wei-Di Chang, Scott Fujimoto, David Meger, and Gregory Dudek

Reinforcement Learning Journal, vol. 4, 2024, pp. 1911–1923.

Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.


Download:

Abstract:

Imitation Learning from Observation (ILfO) is a setting in which a learner tries to imitate the behavior of an expert, using only observational data and without the direct guidance of demonstrated actions. In this paper, we re-examine optimal transport for IL, in which a reward is generated based on the Wasserstein distance between the state trajectories of the learner and expert. We show that existing methods can be simplified to generate a reward function without requiring learned models or adversarial learning. Unlike many other state-of-the-art methods, our approach can be integrated with any RL algorithm and is amenable to ILfO. We demonstrate the effectiveness of this simple approach on a variety of continuous control tasks and find that it surpasses the state of the art in the IlfO setting, achieving expert-level performance across a range of evaluation domains even when observing only a single expert trajectory without actions.


Citation Information:

Wei-Di Chang, Scott Fujimoto, David Meger, and Gregory Dudek. "Imitation Learning from Observation through Optimal Transport." Reinforcement Learning Journal, vol. 4, 2024, pp. 1911–1923.

BibTeX:

@article{chang2024imitation,
    title={Imitation Learning from Observation through Optimal Transport},
    author={Chang, Wei-Di and Fujimoto, Scott and Meger, David and Dudek, Gregory},
    journal={Reinforcement Learning Journal},
    volume={4},
    pages={1911--1923},
    year={2024}
}