Cost Aware Best Arm Identification

By Kellen Kanarios, Qining Zhang, and Lei Ying

Reinforcement Learning Journal, vol. 4, 2024, pp. 1533–1545.

Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.


Download:

Abstract:

In this paper, we study a best arm identification problem with dual objects. In addition to the classic reward, each arm is associated with a cost distribution and the goal is to identify the largest reward arm using the minimum expected cost. We call it Cost Aware Best Arm Identification (CABAI), which captures the separation of testing and implementation phases in product development pipelines and models the objective shift between phases, i.e., cost for testing and reward for implementation. We first derive an theoretic lower bound for CABAI and propose an algorithm called $\mathsf{CTAS}$ to match it asymptotically. To reduce the computation of $\mathsf{CTAS}$, we further propose a low-complexity algorithm called CO, based on a square-root rule, which proves optimal in simplified two-armed models and generalizes surprisingly well in numerical experiments. Our results show (i) ignoring the heterogeneous action cost results in sub-optimality in practice, and (ii) low-complexity algorithms deliver near-optimal performance over a wide range of problems.


Citation Information:

Kellen Kanarios, Qining Zhang, and Lei Ying. "Cost Aware Best Arm Identification." Reinforcement Learning Journal, vol. 4, 2024, pp. 1533–1545.

BibTeX:

@article{kanarios2024cost,
    title={Cost Aware Best Arm Identification},
    author={Kanarios, Kellen and Zhang, Qining and Ying, Lei},
    journal={Reinforcement Learning Journal},
    volume={4},
    pages={1533--1545},
    year={2024}
}