On Welfare-Centric Fair Reinforcement Learning

By Cyrus Cousins, Kavosh Asadi, Elita Lobo, and Michael Littman

Reinforcement Learning Journal, vol. 3, 2024, pp. 1124–1137.

Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.


Download:

Abstract:

We propose a welfare-centric fair reinforcement-learning setting, in which an agent enjoys vector-valued reward from a set of beneficiaries. Given a welfare function W(·), the task is to select a policy π̂ that approximately optimizes the welfare of theirvalue functions from start state s0 , i.e., π̂ ≈ argmaxπ W Vπ1 (s0 ), Vπ2 (s0 ), . . . , Vπg (s0 ) . We find that welfare-optimal policies are stochastic and start-state dependent. Whether individual actions are mistakes depends on the policy, thus mistake bounds, regret analysis, and PAC-MDP learning do not readily generalize to our setting. We develop the adversarial-fair KWIK (Kwik-Af) learning model, wherein at each timestep, an agent either takes an exploration action or outputs an exploitation policy, such that the number of exploration actions is bounded and each exploitation policy is ε-welfare optimal. Finally, we reduce PAC-MDP to Kwik-Af, introduce the Equitable Explicit Explore Exploit (E4) learner, and show that it Kwik-Af learns.


Citation Information:

Cyrus Cousins, Kavosh Asadi, Elita Lobo, and Michael Littman. "On Welfare-Centric Fair Reinforcement Learning." Reinforcement Learning Journal, vol. 3, 2024, pp. 1124–1137.

BibTeX:

@article{cousins2024welfare,
    title={On Welfare-Centric Fair Reinforcement Learning},
    author={Cousins, Cyrus and Asadi, Kavosh and Lobo, Elita and Littman, Michael},
    journal={Reinforcement Learning Journal},
    volume={3},
    pages={1124--1137},
    year={2024}
}