Reinforcement Learning Journal, vol. 1, 2024, pp. 342–365.
Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.
Robust reinforcement learning agents using high-dimensional observations must be able to identify relevant state features amidst many exogeneous distractors. A representation that captures controllability identifies these state elements by determining what affects agent control. While methods such as inverse dynamics and mutual information capture controllability for a limited number of timesteps, capturing long-horizon elements remains a challenging problem. Myopic controllability can capture the moment right before an agent crashes into a wall, but not the control-relevance of the wall while the agent is still some distance away. To address this we introduce action-bisimulation encoding, a method inspired by the bisimulation invariance pseudometric, that extends single-step controllability with a recursive invariance constraint. By doing this, action-bisimulation learns a multi-step controllability metric that smoothly discounts distant state features that are relevant for control. We demonstrate that action-bisimulation pretraining on reward-free, uniformly random data improves sample efficiency in several environments, including a photorealistic 3D simulation domain, Habitat. Additionally, we provide theoretical analysis and qualitative results demonstrating the information captured by action-bisimulation.
Max Rudolph, Caleb Chuck, Kevin Black, Misha Lvovsky, Scott Niekum, and Amy Zhang. "Learning Action-based Representations Using Invariance." Reinforcement Learning Journal, vol. 1, 2024, pp. 342–365.
BibTeX:@article{rudolph2024learning,
title={Learning Action-based Representations Using Invariance},
author={Rudolph, Max and Chuck, Caleb and Black, Kevin and Lvovsky, Misha and Niekum, Scott and Zhang, Amy},
journal={Reinforcement Learning Journal},
volume={1},
pages={342--365},
year={2024}
}