RLJ 2024: Volumes 1–5
You can download this entire issue as one large (320 MB) pdf here: link. DOI https://doi.org/10.5281/zenodo.13899776. Below are links to individual papers.
- Co-Learning Empirical Games & World Models, by Max Olan Smith, and Michael P. Wellman.
- Improving Thompson Sampling via Information Relaxation for Budgeted Multi-armed Bandits, by Woojin Jeong, and Seungki Min.
- Graph Neural Thompson Sampling, by Shuang Wu, and Arash A. Amini.
- JoinGym: An Efficient Join Order Selection Environment, by Junxiong Wang, Kaiwen Wang, Yueying Li, Nathan Kallus, Immanuel Trummer, and Wen Sun.
- An Open-Loop Baseline for Reinforcement Learning Locomotion Tasks, by Antonin Raffin, Olivier Sigaud, Jens Kober, Alin Albu-Schaeffer, João Silvério, and Freek Stulp.
- Online Planning in POMDPs with State-Requests, by Raphaël Avalos, Eugenio Bargiacchi, Ann Nowe, Diederik Roijers, and Frans A Oliehoek.
- A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning, by Abdulaziz Almuzairee, Nicklas Hansen, and Henrik I Christensen.
- BetaZero: Belief-State Planning for Long-Horizon POMDPs using Learned Approximations, by Robert J. Moss, Anthony Corso, Jef Caers, and Mykel Kochenderfer.
- Non-adaptive Online Finetuning for Offline Reinforcement Learning, by Audrey Huang, Mohammad Ghavamzadeh, Nan Jiang, and Marek Petrik.
- Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning, by Nicholas E. Corrado, Yuxiao Qu, John U. Balis, Adam Labiosa, and Josiah P. Hanna.
- Towards Principled, Practical Policy Gradient for Bandits and Tabular MDPs, by Michael Lu, Matin Aghaei, Anant Raj, and Sharan Vaswani.
- Unifying Model-Based and Model-Free Reinforcement Learning with Equivalent Policy Sets, by Benjamin Freed, Thomas Wei, Roberto Calandra, Jeff Schneider, and Howie Choset.
- The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function Approximation, by Noah Golowich, and Ankur Moitra.
- Learning Action-based Representations Using Invariance, by Max Rudolph, Caleb Chuck, Kevin Black, Misha Lvovsky, Scott Niekum, and Amy Zhang.
- Cyclicity-Regularized Coordination Graphs, by Oliver Järnefelt, Mahdi Kallel, and Carlo D'Eramo.
- Assigning Credit with Partial Reward Decoupling in Multi-Agent Proximal Policy Optimization, by Aditya Kapoor, Benjamin Freed, Jeff Schneider, and Howie Choset.
- OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments, by Quentin Delfosse, Jannis Blüml, Bjarne Gregori, Sebastian Sztwiertnia, and Kristian Kersting.
- SplAgger: Split Aggregation for Meta-Reinforcement Learning, by Jacob Beck, Matthew Thomas Jackson, Risto Vuorio, Zheng Xiong, and Shimon Whiteson.
- A Tighter Convergence Proof of Reverse Experience Replay, by Nan Jiang, Jinzhao Li, and Yexiang Xue.
- Learning to Optimize for Reinforcement Learning, by Qingfeng Lan, A. Rupam Mahmood, Shuicheng YAN, and Zhongwen Xu.
- Multi-view Disentanglement for Reinforcement Learning with Multiple Cameras, by Mhairi Dunion, and Stefano V Albrecht.
- Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning, by Trevor McInroe, Adam Jelley, Stefano V Albrecht, and Amos Storkey.
- Surprise-Adaptive Intrinsic Motivation for Unsupervised Reinforcement Learning, by Adriana Hugessen, Roger Creus Castanyer, Faisal Mohamed, and Glen Berseth.
- Mitigating the Curse of Horizon in Monte-Carlo Returns, by Alex Ayoub, David Szepesvari, Francesco Zanini, Bryan Chan, Dhawal Gupta, Bruno Castro da Silva, and Dale Schuurmans.
- A Simple Mixture Policy Parameterization for Improving Sample Efficiency of CVaR Optimization, by Yudong Luo, Yangchen Pan, Han Wang, Philip Torr, and Pascal Poupart.
- ROIL: Robust Offline Imitation Learning without Trajectories, by Gersi Doko, Guang Yang, Daniel S. Brown, and Marek Petrik.
- Harnessing Discrete Representations for Continual Reinforcement Learning, by Edan Jacob Meyer, Adam White, and Marlos C. Machado.
- Three Dogmas of Reinforcement Learning, by David Abel, Mark K Ho, and Anna Harutyunyan.
- Policy Gradient with Active Importance Sampling, by Matteo Papini, Giorgio Manganini, Alberto Maria Metelli, and Marcello Restelli.
- The Limits of Pure Exploration in POMDPs: When the Observation Entropy is Enough, by Riccardo Zamboni, Duilio Cirino, Marcello Restelli, and Mirco Mutti.
- Physics-Informed Model and Hybrid Planning for Efficient Dyna-Style Reinforcement Learning, by Zakariae EL ASRI, Olivier Sigaud, and Nicolas THOME.
- Trust-based Consensus in Multi-Agent Reinforcement Learning Systems, by Ho Long Fung, Victor-Alexandru Darvariu, Stephen Hailes, and Mirco Musolesi.
- Bidirectional-Reachable Hierarchical Reinforcement Learning with Mutually Responsive Policies, by Yu Luo, Fuchun Sun, Tianying Ji, and Xianyuan Zhan.
- Informed POMDP: Leveraging Additional Information in Model-Based RL, by Gaspard Lambrechts, Adrien Bolland, and Damien Ernst.
- An Optimal Tightness Bound for the Simulation Lemma, by Sam Lobel, and Ronald Parr.
- Best Response Shaping, by Milad Aghajohari, Tim Cooijmans, Juan Agustin Duque, Shunichi Akatsuka, and Aaron Courville.
- A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning, by Gianluca Drappo, Alberto Maria Metelli, and Marcello Restelli.
- SwiftTD: A Fast and Robust Algorithm for Temporal Difference Learning, by Khurram Javed, Arsalan Sharifnassab, and Richard S. Sutton.
- The Cliff of Overcommitment with Policy Gradient Step Sizes, by Scott M. Jordan, Samuel Neumann, James E. Kostas, Adam White, and Philip S. Thomas.
- Multistep Inverse Is Not All You Need, by Alexander Levine, Peter Stone, and Amy Zhang.
- Contextualized Hybrid Ensemble Q-learning: Learning Fast with Control Priors, by Emma Cramer, Bernd Frauenknecht, Ramil Sabirov, and Sebastian Trimpe.
- Sequential Decision-Making for Inline Text Autocomplete, by Rohan Chitnis, Shentao Yang, and Alborz Geramifard.
- Exploring Uncertainty in Distributional Reinforcement Learning, by Georgy Antonov, and Peter Dayan.
- Robotic Manipulation Datasets for Offline Compositional Reinforcement Learning, by Marcel Hussing, Jorge Mendez-Mendez, Anisha Singrodia, Cassandra Kent, and Eric Eaton.
- Dissecting Deep RL with High Update Ratios: Combatting Value Divergence, by Marcel Hussing, Claas A Voelcker, Igor Gilitschenski, Amir-massoud Farahmand, and Eric Eaton.
- Demystifying the Recency Heuristic in Temporal-Difference Learning, by Brett Daley, Marlos C. Machado, and Martha White.
- On the consistency of hyper-parameter selection in value-based deep reinforcement learning, by Johan Samir Obando Ceron, João Guilherme Madeira Araújo, Aaron Courville, and Pablo Samuel Castro.
- Value Internalization: Learning and Generalizing from Social Reward, by Frieda Rong, and Max Kleiman-Weiner.
- Mixture of Experts in a Mixture of RL settings, by Timon Willi, Johan Samir Obando Ceron, Jakob Nicolaus Foerster, Gintare Karolina Dziugaite, and Pablo Samuel Castro.
- Aquatic Navigation: A Challenging Benchmark for Deep Reinforcement Learning, by Davide Corsi, Davide Camponogara, and Alessandro Farinelli.
- On Welfare-Centric Fair Reinforcement Learning, by Cyrus Cousins, Kavosh Asadi, Elita Lobo, and Michael Littman.
- Inverse Reinforcement Learning with Multiple Planning Horizons, by Jiayu Yao, Weiwei Pan, Finale Doshi-Velez, and Barbara E Engelhardt.
- Constant Stepsize Q-learning: Distributional Convergence, Bias and Extrapolation, by Yixuan Zhang, and Qiaomin Xie.
- More Efficient Randomized Exploration for Reinforcement Learning via Approximate Sampling, by Haque Ishfaq, Yixin Tan, Yu Yang, Qingfeng Lan, Jianfeng Lu, A. Rupam Mahmood, Doina Precup, and Pan Xu.
- Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis, by Qining Zhang, Honghao Wei, and Lei Ying.
- A Natural Extension To Online Algorithms For Hybrid RL With Limited Coverage, by Kevin Tan, and Ziping Xu.
- Tiered Reward: Designing Rewards for Specification and Fast Learning of Desired Behavior, by Zhiyuan Zhou, Shreyas Sundara Raman, Henry Sowerby, and Michael Littman.
- Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach, by Bin Hu, Chenyang Zhao, Pu Zhang, Zihao Zhou, Yuanhang Yang, Zenglin Xu, and Bin Liu.
- An Idiosyncrasy of Time-discretization in Reinforcement Learning, by Kris De Asis, and Richard S. Sutton.
- Dreaming of Many Worlds: Learning Contextual World Models aids Zero-Shot Generalization, by Sai Prasanna, Karim Farid, Raghu Rajan, and André Biedenkapp.
- Policy Gradient Algorithms with Monte Carlo Tree Learning for Non-Markov Decision Processes, by Tetsuro Morimura, Kazuhiro Ota, Kenshi Abe, and Peinan Zhang.
- Offline Diversity Maximization under Imitation Constraints, by Marin Vlastelica, Jin Cheng, Georg Martius, and Pavel Kolev.
- Zero-shot cross-modal transfer of Reinforcement Learning policies through a Global Workspace, by Léopold Maytié, Benjamin Devillers, Alexandre Arnold, and Rufin VanRullen.
- Stabilizing Extreme Q-learning by Maclaurin Expansion, by Motoki Omura, Takayuki Osa, YUSUKE Mukuta, and Tatsuya Harada.
- Combining Automated Optimisation of Hyperparameters and Reward Shape, by Julian Dierkes, Emma Cramer, Holger Hoos, and Sebastian Trimpe.
- Sample Complexity of Offline Distributionally Robust Linear Markov Decision Processes, by He Wang, Laixi Shi, and Yuejie Chi.
- PASTA: Pretrained Action-State Transformer Agents, by Raphael Boige, Yannis Flet-Berliac, Lars C.P.M. Quaedvlieg, Arthur Flajolet, Guillaume Richard, and Thomas PIERROT.
- Cost Aware Best Arm Identification, by Kellen Kanarios, Qining Zhang, and Lei Ying.
- ICU-Sepsis: A Benchmark MDP Built from Real Medical Data, by Kartik Choudhary, Dhawal Gupta, and Philip S. Thomas.
- When does Self-Prediction help? Understanding Auxiliary Tasks in Reinforcement Learning, by Claas A Voelcker, Tyler Kastner, Igor Gilitschenski, and Amir-massoud Farahmand.
- ROER: Regularized Optimal Experience Replay, by Changling Li, Zhang-Wei Hong, Pulkit Agrawal, Divyansh Garg, and Joni Pajarinen.
- Combining Reconstruction and Contrastive Methods for Multimodal Representations in RL, by Philipp Becker, Sebastian Mossburger, Fabian Otto, and Gerhard Neumann.
- RL for Consistency Models: Reward Guided Text-to-Image Generation with Fast Inference, by Owen Oertell, Jonathan Daniel Chang, Yiyi Zhang, Kianté Brantley, and Wen Sun.
- A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo, by Miguel Vasco, Takuma Seno, Kenta Kawamoto, Kaushik Subramanian, Peter R. Wurman, and Peter Stone.
- Bad Habits: Policy Confounding and Out-of-Trajectory Generalization in RL, by Miguel Suau, Matthijs T. J. Spaan, and Frans A Oliehoek.
- Learning Abstract World Models for Value-preserving Planning with Options, by Rafael Rodriguez-Sanchez, and George Konidaris.
- Verification-Guided Shielding for Deep Reinforcement Learning, by Davide Corsi, Guy Amir, Andoni Rodríguez, Guy Katz, César Sánchez, and Roy Fox.
- Learning Discrete World Models for Heuristic Search, by Forest Agostinelli, and Misagh Soltani.
- Distributionally Robust Constrained Reinforcement Learning under Strong Duality, by Zhengfei Zhang, Kishan Panaganti, Laixi Shi, Yanan Sui, Adam Wierman, and Yisong Yue.
- Representation Alignment from Human Feedback for Cross-Embodiment Reward Learning from Mixed-Quality Demonstrations, by Connor Mattson, Anurag Sidharth Aribandi, and Daniel S. Brown.
- Revisiting Sparse Rewards for Goal-Reaching Reinforcement Learning, by Gautham Vasan, Yan Wang, Fahim Shahriar, James Bergstra, Martin Jägersand, and A. Rupam Mahmood.
- Policy-Guided Diffusion, by Matthew Thomas Jackson, Michael Matthews, Cong Lu, Benjamin Ellis, Shimon Whiteson, and Jakob Nicolaus Foerster.
- Agent-Centric Human Demonstrations Train World Models, by James Staley, Elaine Short, Shivam Goel, and Yash Shukla.
- Can Differentiable Decision Trees Enable Interpretable Reward Learning from Human Feedback?, by Akansha Kalra, and Daniel S. Brown.
- Imitation Learning from Observation through Optimal Transport, by Wei-Di Chang, Scott Fujimoto, David Meger, and Gregory Dudek.
- Light-weight Probing of Unsupervised Representations for Reinforcement Learning, by Wancong Zhang, Anthony GX-Chen, Vlad Sobal, Yann LeCun, and Nicolas Carion.
- Quantifying Interaction Level Between Agents Helps Cost-efficient Generalization in Multi-agent Reinforcement Learning, by Yuxin Chen, Chen Tang, Thomas Tian, Chenran Li, Jinning Li, Masayoshi Tomizuka, and Wei Zhan.
- Shield Decomposition for Safe Reinforcement Learning in General Partially Observable Multi-Agent Environments, by Daniel Melcer, Christopher Amato, and Stavros Tripakis.
- Reward Centering, by Abhishek Naik, Yi Wan, Manan Tomar, and Richard S. Sutton.
- MultiHyRL: Robust Hybrid RL for Obstacle Avoidance against Adversarial Attacks on the Observation Space, by Jan de Priester, Zachary Bell, Prashant Ganesh, and Ricardo Sanfelice.
- Investigating the Interplay of Prioritized Replay and Generalization, by Parham Mohammad Panahi, Andrew Patterson, Martha White, and Adam White.
- Towards General Negotiation Strategies with End-to-End Reinforcement Learning, by Bram M. Renting, Thomas M. Moerland, Holger Hoos, and Catholijn M Jonker.
- PID Accelerated Temporal Difference Algorithms, by Mark Bedaywi, Amin Rakhsha, and Amir-massoud Farahmand.
- States as goal-directed concepts: an epistemic approach to state-representation learning, by Nadav Amir, Yael Niv, and Angela J Langdon.
- Posterior Sampling for Continuing Environments, by Wanqiao Xu, Shi Dong, and Benjamin Van Roy.
- Reinforcement Learning from Delayed Observations via World Models, by Armin Karamzade, Kyungmin Kim, Montek Kalsi, and Roy Fox.
- Offline Reinforcement Learning from Datasets with Structured Non-Stationarity, by Johannes Ackermann, Takayuki Osa, and Masashi Sugiyama.
- Resource Usage Evaluation of Discrete Model-Free Deep Reinforcement Learning Algorithms, by Olivia P. Dizon-Paradis, Stephen E. Wormald, Daniel E. Capecci, Avanti Bhandarkar, and Damon L. Woodard.
- D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning, by Rafael Rafailov, Kyle Beltran Hatch, Anikait Singh, Aviral Kumar, Laura Smith, Ilya Kostrikov, Philippe Hansen-Estruch, Victor Kolev, Philip J. Ball, Jiajun Wu, Sergey Levine, and Chelsea Finn.
- Weight Clipping for Deep Continual and Reinforcement Learning, by Mohamed Elsayed, Qingfeng Lan, Clare Lyle, and A. Rupam Mahmood.
- A Batch Sequential Halving Algorithm without Performance Degradation, by Sotetsu Koyamada, Soichiro Nishimori, and Shin Ishii.
- Causal Contextual Bandits with Adaptive Context, by Rahul Madhavan, Aurghya Maiti, Gaurav Sinha, and Siddharth Barman.
- Policy Architectures for Compositional Generalization in Control, by Allan Zhou, Vikash Kumar, Chelsea Finn, and Aravind Rajeswaran.
- Semi-Supervised One Shot Imitation Learning, by Philipp Wu, Kourosh Hakhamaneshi, Yuqing Du, Igor Mordatch, Aravind Rajeswaran, and Pieter Abbeel.
- Cross-environment Hyperparameter Tuning for Reinforcement Learning, by Andrew Patterson, Samuel Neumann, Raksha Kumaraswamy, Martha White, and Adam White.
- Human-compatible driving agents through data-regularized self-play reinforcement learning, by Daphne Cornelisse, and Eugene Vinitsky.
- Inception: Efficiently Computable Misinformation Attacks on Markov Games, by Jeremy McMahan, Young Wu, Yudong Chen, Jerry Zhu, and Qiaomin Xie.
- Learning to Navigate in Mazes with Novel Layouts using Abstract Top-down Maps, by Linfeng Zhao, and Lawson L.S. Wong.
- Boosting Soft Q-Learning by Bounding, by Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, and Rahul V Kulkarni.
- Bandits with Multimodal Structure, by Hassan SABER, and Odalric-Ambrym Maillard.
- Bounding-Box Inference for Error-Aware Model-Based Reinforcement Learning, by Erin J Talvitie, Zilei Shao, Huiying Li, Jinghan Hu, Jacob Boerma, Rory Zhao, and Xintong Wang.
- Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms, by Javad Azizi, Thang Duong, Yasin Abbasi-Yadkori, András György, Claire Vernade, and Mohammad Ghavamzadeh.
- Optimizing Rewards while meeting $\omega$-regular Constraints, by Christopher Zeitler, Kristina Miller, Sayan Mitra, John Schierman, and Mahesh Viswanathan.