Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.
Presented at the Reinforcement Learning Conference (RLC), Edmonton, Alberta, Canada, August 5–9, 2025.
As AI systems become increasingly autonomous, reliably aligning their decision-making to human preferences is essential. Inverse reinforcement learning (IRL) offers a promising approach to infer preferences from demonstrations. These preferences can then be used to produce an apprentice policy that performs well on the demonstrated task. However, in domains like autonomous driving or robotics, where errors can have serious consequences, we need not just good average performance but reliable policies with formal guarantees. But obtaining sufficient human demonstrations for reliability guarantees can be costly. *Active* IRL addresses this challenge by strategically selecting the most informative scenarios for human demonstration. We introduce PAC-EIG, an information-theoretic acquisition function that directly targets probably-approximately-correct (PAC) guarantees for the learned policy -- providing the first such theoretical guarantee for active IRL with imperfect expert demonstrations. Our method maximizes information gain about immediate regret, efficiently identifying which states require further demonstration to ensure reliable apprentice behaviour. We also present an alternative method for scenarios where learning the reward itself is the primary objective. We prove convergence bounds, illustrate failure modes of prior heuristic methods, and demonstrate our approach experimentally.
Ondrej Bajgar, Dewi Sid William Gould, Jonathon Liu, Alessandro Abate, Konstantinos Gatsis, and Michael A Osborne. "PAC Apprenticeship Learning with Bayesian Active Inverse Reinforcement Learning." Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.
BibTeX:@article{bajgar2025apprenticeship,
title={{PAC} Apprenticeship Learning with {Bayesian} Active Inverse Reinforcement Learning},
author={Bajgar, Ondrej and Gould, Dewi Sid William and Liu, Jonathon and Abate, Alessandro and Gatsis, Konstantinos and Osborne, Michael A},
journal={Reinforcement Learning Journal},
year={2025}
}