Informed POMDP: Leveraging Additional Information in Model-Based RL

By Gaspard Lambrechts, Adrien Bolland, and Damien Ernst

Reinforcement Learning Journal, vol. 2, 2024, pp. 763–784.

Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.


Download:

Abstract:

In this work, we generalize the problem of learning through interaction in a POMDP by accounting for eventual additional information available at training time. First, we introduce the informed POMDP, a new learning paradigm offering a clear distinction between the information at training and the observation at execution. Next, we propose an objective that leverages this information for learning a sufficient statistic of the history for the optimal control. We then adapt this informed objective to learn a world model able to sample latent trajectories. Finally, we empirically show a learning speed improvement in several environments using this informed world model in the Dreamer algorithm. These results and the simplicity of the proposed adaptation advocate for a systematic consideration of eventual additional information when learning in a POMDP using model-based RL.


Citation Information:

Gaspard Lambrechts, Adrien Bolland, and Damien Ernst. "Informed POMDP: Leveraging Additional Information in Model-Based RL." Reinforcement Learning Journal, vol. 2, 2024, pp. 763–784.

BibTeX:

@article{lambrechts2024informed,
    title={Informed {POMDP}: {L}everaging Additional Information in Model-Based {RL}},
    author={Lambrechts, Gaspard and Bolland, Adrien and Ernst, Damien},
    journal={Reinforcement Learning Journal},
    volume={2},
    pages={763--784},
    year={2024}
}