Optimizing Rewards while meeting $\omega$-regular Constraints

By Christopher Zeitler, Kristina Miller, Sayan Mitra, John Schierman, and Mahesh Viswanathan

Reinforcement Learning Journal, vol. 5, 2024, pp. 2492–2514.

Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.


Download:

Abstract:

This paper addresses the problem of synthesizing policies for Markov Decision Processes (MDPs) with hard $\omega$-regular constraints, which include and are more general than safety, reachability, liveness, and fairness. The objective is to derive a policy that not only makes the MDP adhere to the given $\omega$-regular constraint $T$ with certainty but also maximizes the expected reward. We first show that there are no optimal policies for the general constrained MDP (CMDP) problem with $\omega$-regular constraints, which contrasts with simpler problem of CMDPs with safety requirements. Next we show that, despite its complexity, the optimal policy can be approximated within any desired level of accuracy in polynomial time. This approximation ensures both the fulfillment of the $\omega$-regular constraint with probability $1$ and the attainment of a $\epsilon$-optimal reward for any given $\epsilon>0$. The proof identifies specific classes of policies capable of achieving these objectives and may be of independent interest. Furthermore, we introduce an approach to tackle the CMDP problem by transforming it into a classical MDP reward optimization problem, thereby enabling the application of conventional reinforcement learning. We show that proximal policy optimization is an effective approach to identifying near-optimal policies that satisfy $\omega$-regular constraints. This result is demonstrated across multiple environments and constraint types.


Citation Information:

Christopher Zeitler, Kristina Miller, Sayan Mitra, John Schierman, and Mahesh Viswanathan. "Optimizing Rewards while meeting $\omega$-regular Constraints." Reinforcement Learning Journal, vol. 5, 2024, pp. 2492–2514.

BibTeX:

@article{zeitler2024optimizing,
    title={Optimizing Rewards while meeting $\omega$-regular Constraints},
    author={Zeitler, Christopher and Miller, Kristina and Mitra, Sayan and Schierman, John and Viswanathan, Mahesh},
    journal={Reinforcement Learning Journal},
    volume={5},
    pages={2492--2514},
    year={2024}
}