Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.
Presented at the Reinforcement Learning Conference (RLC), Edmonton, Alberta, Canada, August 5–9, 2025.
We introduce the constrained best mixed arm identification (CBMAI) problem under unknown reward and costs wherein there are K arms, each of which is associated with a reward and multiple cost attributes. These are random, and come from distributions with unknown means. The best mixed arm is a probability distribution over a subset of the K arms that maximizes the expected reward while satisfying the expected cost constraints. We are specifically interested in a pure exploration problem under a fixed sampling budget with the goal of identifying the support of the best mixed arm. We propose a novel, parameter-free algorithm, called the Score Function-based Successive Reject (SFSR) algorithm, that combines the classical successive reject framework with a novel rejection criteria using a score function based on linear programming theory. We establish a performance guarantee for our algorithm by providing a theoretical upper bound on the probability of mis-identification of the support of the best mixed arm and show that it decays exponentially in the budget N and some constants that characterize the hardness of the problem instance. We also develop an information-theoretic lower bound on the error probability that shows that these constants appropriately characterize the problem difficulty. We validate this empirically on a number of problem instances.
Dengwang Tang, Rahul Jain, Ashutosh Nayyar, and Pierluigi Nuzzo. "Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget." Reinforcement Learning Journal, vol. TBD, 2025, pp. TBD.
BibTeX:@article{tang2025pure,
title={Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget},
author={Tang, Dengwang and Jain, Rahul and Nayyar, Ashutosh and Nuzzo, Pierluigi},
journal={Reinforcement Learning Journal},
year={2025}
}