Reinforcement Learning Journal, vol. 5, 2024, pp. 2284–2297.
Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.
One-shot Imitation Learning (OSIL) aims to imbue AI agents with the ability to learn a new task from a single demonstration. To supervise the learning, OSIL requires a prohibitively large number of paired expert demonstrations: trajectories corresponding to different variations of the same semantic task. To overcome this limitation, we introduce the semi-supervised OSIL problem setting, where the learning agent is presented with a large dataset of tasks with only one demonstration each (unpaired dataset), along with a small dataset of tasks with multiple demonstrations (paired dataset). This presents a more realistic and practical embodiment of few-shot learning and requires the agent to effectively leverage weak supervision. Subsequently, we develop an algorithm applicable to this semi-supervised OSIL setting. Our approach first learns an embedding space where different tasks cluster uniquely. We utilize this embedding space and the clustering it supports to self-generate pairings between trajectories in the large unpaired dataset. Through empirical results, we demonstrate that OSIL models trained on such self-generated pairings (labels) are competitive with OSIL models trained with ground-truth labels, presenting a major advancement in the label-efficiency of OSIL.
Philipp Wu, Kourosh Hakhamaneshi, Yuqing Du, Igor Mordatch, Aravind Rajeswaran, and Pieter Abbeel. "Semi-Supervised One Shot Imitation Learning." Reinforcement Learning Journal, vol. 5, 2024, pp. 2284–2297.
BibTeX:@article{wu2024semi,
title={Semi-Supervised One Shot Imitation Learning},
author={Wu, Philipp and Hakhamaneshi, Kourosh and Du, Yuqing and Mordatch, Igor and Rajeswaran, Aravind and Abbeel, Pieter},
journal={Reinforcement Learning Journal},
volume={5},
pages={2284--2297},
year={2024}
}