Reinforcement Learning Journal, vol. 4, 2024, pp. 1873–1886.
Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.
Previous work in interactive reinforcement learning considers human behavior directly in agent policy learning, but this requires estimating the distribution of human behavior over many samples to prevent bias. Our work shows that model-based systems can avoid this problem by using small amounts of human data to guide world-model learning rather than agent-policy learning. We show that this approach learns faster and produces useful policies more reliably than prior state-of-the-art. We evaluate our approach with expert human demonstrations in two environments: PinPad5, a fully observable environment which prioritizes task composition, and MemoryMaze, a partially observable environment which prioritizes exploration and memory. We show an order of magnitude speed-up in learning and reliability with only nine minutes of expert human demonstration data.
James Staley, Elaine Short, Shivam Goel, and Yash Shukla. "Agent-Centric Human Demonstrations Train World Models." Reinforcement Learning Journal, vol. 4, 2024, pp. 1873–1886.
BibTeX:@article{staley2024agent,
title={Agent-Centric Human Demonstrations Train World Models},
author={Staley, James and Short, Elaine and Goel, Shivam and Shukla, Yash},
journal={Reinforcement Learning Journal},
volume={4},
pages={1873--1886},
year={2024}
}