Reinforcement Learning Journal, vol. 2, 2024, pp. 884–925.

Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.

Download:

Abstract:

In real-world control settings, the observation space is often unnecessarily high-dimensional and subject to time-correlated noise. However, the *controllable* dynamics of the system are often far simpler than the dynamics of the raw observations. It is therefore desirable to learn an encoder to map the observation space to a simpler space of control-relevant variables. In this work, we consider the Ex-BMDP model, first proposed by Efroni et al. (2022), which formalizes control problems where observations can be factorized into an action-dependent latent state which evolves deterministically, and action-independent time-correlated noise. Lamb et al. (2022) proposes the ""AC-State"" method for learning an encoder to extract a complete action-dependent latent state representation from the observations in such problems. AC-State is a *multistep-inverse* method, in that it uses the encoding of the the first and last state in a path to predict the *first* action in the path. However, we identify cases where AC-State will fail to learn a correct latent representation of the agent-controllable factor of the state. We therefore propose a new algorithm, ACDF, which combines multistep-inverse prediction with a latent forward model. ACDF is guaranteed to correctly infer an action-dependent latent state encoder for a large class of Ex-BMDP models. We demonstrate the effectiveness of ACDF on tabular Ex-BMDPs through numerical simulations; as well as high-dimensional environments using neural-network-based encoders. Code is available at https://github.com/midi-lab/acdf.

Citation Information:

Alexander Levine, Peter Stone, and Amy Zhang. "Multistep Inverse Is Not All You Need." Reinforcement Learning Journal, vol. 2, 2024, pp. 884–925.

BibTeX:

@article{levine2024multistep, title={Multistep Inverse Is Not All You Need}, author={Levine, Alexander and Stone, Peter and Zhang, Amy}, journal={Reinforcement Learning Journal}, volume={2}, pages={884--925}, year={2024} }