Co-Learning Empirical Games & World Models

By Max Olan Smith, and Michael P. Wellman

Reinforcement Learning Journal, vol. 1, 2024, pp. 1–15.

Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.


Download:

Abstract:

Game-based decision-making involves reasoning over both world dynamics and strategic interactions among the agents. Typically, models capturing these respective aspects are learned and used separately. We investigate the potential gain from co-learning these elements: a world model for dynamics and an empirical game for strategic interactions. Empirical games drive world models toward a broader consideration of possible game dynamics induced by a diversity of strategy profiles. Conversely, world models guide empirical games to efficiently discover new strategies through planning. We demonstrate these benefits first independently, then in combination as a new algorithm, Dyna-PSRO, that co-learns an empirical game and a world model. When compared to PSRO---a baseline empirical-game building algorithm, Dyna-PSRO is found to compute lower regret solutions on partially observable general-sum games. In our experiments, Dyna-PSRO also requires substantially fewer experiences than PSRO, a key algorithmic advantage for settings where collecting player-game interaction data is a cost-limiting factor.


Citation Information:

Max Olan Smith and Michael P Wellman. "Co-Learning Empirical Games & World Models." Reinforcement Learning Journal, vol. 1, 2024, pp. 1–15.

BibTeX:

@article{smith2024learning,
    title={Co-Learning Empirical Games & World Models},
    author={Smith, Max Olan and Wellman, Michael P.},
    journal={Reinforcement Learning Journal},
    volume={1},
    pages={1--15},
    year={2024}
}