Imitation Learning from Observation through Optimal Transport

By Wei-Di Chang, Scott Fujimoto, David Meger, and Gregory Dudek

Reinforcement Learning Journal, vol. 4, 2024, pp. 1911–1923.

Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.



Imitation Learning from Observation (ILfO) is a setting in which a learner tries to imitate the behavior of an expert, using only observational data and without the direct guidance of demonstrated actions. In this paper, we re-examine optimal transport for IL, in which a reward is generated based on the Wasserstein distance between the state trajectories of the learner and expert. We show that existing methods can be simplified to generate a reward function without requiring learned models or adversarial learning. Unlike many other state-of-the-art methods, our approach can be integrated with any RL algorithm and is amenable to ILfO. We demonstrate the effectiveness of this simple approach on a variety of continuous control tasks and find that it surpasses the state of the art in the IlfO setting, achieving expert-level performance across a range of evaluation domains even when observing only a single expert trajectory without actions.

Citation Information:

Wei-Di Chang, Scott Fujimoto, David Meger, and Gregory Dudek. "Imitation Learning from Observation through Optimal Transport." Reinforcement Learning Journal, vol. 4, 2024, pp. 1911–1923.


    title={Imitation Learning from Observation through Optimal Transport},
    author={Chang, Wei-Di and Fujimoto, Scott and Meger, David and Dudek, Gregory},
    journal={Reinforcement Learning Journal},