JoinGym: An Efficient Join Order Selection Environment

By Junxiong Wang, Kaiwen Wang, Yueying Li, Nathan Kallus, Immanuel Trummer, and Wen Sun

Reinforcement Learning Journal, vol. 1, 2024, pp. 64–91.

Presented at the Reinforcement Learning Conference (RLC), Amherst Massachusetts, August 9–12, 2024.


Download:

Abstract:

Join order selection (JOS), the ordering of join operations to minimize query execution cost, is a core NP-hard combinatorial optimization problem in database query optimization. We present \textsc{JoinGym}, a lightweight and easy-to-use reinforcement learning (RL) environment that captures both left-deep and bushy variants of the JOS problem. Compared to prior works that execute queries online, \textsc{JoinGym} has much higher throughput and efficiently simulates the cost of joins offline by looking up the intermediate table's cardinality from a pre-computed dataset. We provide such a cardinality dataset for $3300$ queries based on real IMDb workloads, which is the largest suite its kind and may be of independent interest. We extensively benchmark several RL algorithms and find that the best policies are competitive with or better than Postgres, a strong non-learning baseline. However, the learned policies can still catastrophically fail on a small fraction of queries which motivates future research using \textsc{JoinGym} to improve generalization and safety in long-tailed, partially observed, combinatorial optimization problems.


Citation Information:

Junxiong Wang, Kaiwen Wang, Yueying Li, Nathan Kallus, Immanuel Trummer, and Wen Sun. "JoinGym: An Efficient Join Order Selection Environment." Reinforcement Learning Journal, vol. 1, 2024, pp. 64–91.

BibTeX:

@article{wang2024joingym,
    title={{JoinGym}: {A}n Efficient Join Order Selection Environment},
    author={Wang, Junxiong and Wang, Kaiwen and Li, Yueying and Kallus, Nathan and Trummer, Immanuel and Sun, Wen},
    journal={Reinforcement Learning Journal},
    volume={1},
    pages={64--91},
    year={2024}
}