Shield Decomposition for Safe Reinforcement Learning in General Partially Observable Multi-Agent Environments

By Daniel Melcer, Christopher Amato, and Stavros Tripakis

Reinforcement Learning Journal, vol. 4, 2024, pp. 1965–1994.

Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.


Download:

Abstract:

As Reinforcement Learning is increasingly used in safety-critical systems, it is important to restrict RL agents to only take safe actions. Shielding is a promising approach to this task; however, in multi-agent domains, shielding has previously been restricted to environments where all agents observe the same information. Most real-world tasks do not satisfy this strong assumption. We discuss the theoretical foundations of multi-agent shielding in environments with general partial observability and develop a novel shielding method which is effective in such domains. Through a series of experiments, we show that agents that use our shielding method are able to safely and successfully solve a variety of RL tasks, including tasks in which prior methods cannot be applied.


Citation Information:

Daniel Melcer, Christopher Amato, and Stavros Tripakis. "Shield Decomposition for Safe Reinforcement Learning in General Partially Observable Multi-Agent Environments." Reinforcement Learning Journal, vol. 4, 2024, pp. 1965–1994.

BibTeX:

@article{melcer2024shield,
    title={Shield Decomposition for Safe Reinforcement Learning in General Partially Observable Multi-Agent Environments},
    author={Melcer, Daniel and Amato, Christopher and Tripakis, Stavros},
    journal={Reinforcement Learning Journal},
    volume={4},
    pages={1965--1994},
    year={2024}
}