Semi-Supervised One Shot Imitation Learning

By Philipp Wu, Kourosh Hakhamaneshi, Yuqing Du, Igor Mordatch, Aravind Rajeswaran, and Pieter Abbeel

Reinforcement Learning Journal, vol. 5, 2024, pp. 2284–2297.

Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.


Download:

Abstract:

One-shot Imitation Learning (OSIL) aims to imbue AI agents with the ability to learn a new task from a single demonstration. To supervise the learning, OSIL requires a prohibitively large number of paired expert demonstrations: trajectories corresponding to different variations of the same semantic task. To overcome this limitation, we introduce the semi-supervised OSIL problem setting, where the learning agent is presented with a large dataset of tasks with only one demonstration each (unpaired dataset), along with a small dataset of tasks with multiple demonstrations (paired dataset). This presents a more realistic and practical embodiment of few-shot learning and requires the agent to effectively leverage weak supervision. Subsequently, we develop an algorithm applicable to this semi-supervised OSIL setting. Our approach first learns an embedding space where different tasks cluster uniquely. We utilize this embedding space and the clustering it supports to self-generate pairings between trajectories in the large unpaired dataset. Through empirical results, we demonstrate that OSIL models trained on such self-generated pairings (labels) are competitive with OSIL models trained with ground-truth labels, presenting a major advancement in the label-efficiency of OSIL.


Citation Information:

Philipp Wu, Kourosh Hakhamaneshi, Yuqing Du, Igor Mordatch, Aravind Rajeswaran, and Pieter Abbeel. "Semi-Supervised One Shot Imitation Learning." Reinforcement Learning Journal, vol. 5, 2024, pp. 2284–2297.

BibTeX:

@article{wu2024semi,
    title={Semi-Supervised One Shot Imitation Learning},
    author={Wu, Philipp and Hakhamaneshi, Kourosh and Du, Yuqing and Mordatch, Igor and Rajeswaran, Aravind and Abbeel, Pieter},
    journal={Reinforcement Learning Journal},
    volume={5},
    pages={2284--2297},
    year={2024}
}