Intrinsically Motivated Discovery of Temporally Abstract Graph-based Models of the World

By Akhil Bagaria, Anita De Mello Koch, Rafael Rodriguez-Sanchez, Sam Lobel, and George Konidaris

Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.

Presented at the Reinforcement Learning Conference (RLC), Edmonton, Alberta, Canada, August 5–9, 2025.

Download: Note: Paper unavailable until authors provide the signed publication agreement.

Abstract:

We seek to design reinforcement learning agents that build plannable models of the world that are abstract in both state and time. We propose a new algorithm to construct a skill graph; nodes in the skill graph represent abstract states and edges represent skill policies. Previous works that learn a skill graph use random sampling from the state-space and nearest-neighbor search: operations that are infeasible in environments with high-dimensional observations (for example, images). Furthermore, previous algorithms attempt to increase the probability of all edges (by repeatedly executing the corresponding skills) so that the resulting graph is robust and reliable everywhere. However, exhaustive coverage is infeasible in large environments, and agents should prioritize practicing skills that are more likely to result in higher reward. We show that our agent can solve challenging image-based exploration problems more rapidly than vanilla model-free RL and state-of-the-art novelty-based exploration; then, we show that the resulting abstract model solve a family of tasks not provided during the agent's exploration phase.

Citation Information:

Akhil Bagaria, Anita De Mello Koch, Rafael Rodriguez-Sanchez, Sam Lobel, and George Konidaris. "Intrinsically Motivated Discovery of Temporally Abstract Graph-based Models of the World." Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.

BibTeX:

@article{bagaria2025intrinsically,
    title={Intrinsically Motivated Discovery of Temporally Abstract Graph-based Models of the World},
    author={Bagaria, Akhil and Koch, Anita De Mello and Rodriguez-Sanchez, Rafael and Lobel, Sam and Konidaris, George},
    journal={Reinforcement Learning Journal},
    year={2025}
}