RLJ 2025: Volumes 6
You can download this entire issue as one large (320 MB) pdf here: link . DOI https://doi.org/10.5281/zenodo.13899776 . Below are links to individual papers.
Download Cover Pages (9MB .pdf)
Reinforcement Learning for Finite Space Mean-Field Type Game , by Kai Shao, Jiacheng Shen, and Mathieu Lauriere .Understanding Behavioral Metric Learning: A Large-Scale Study on Distracting Reinforcement Learning Environments , by Ziyan Luo, Tianwei Ni, Pierre-Luc Bacon, Doina Precup, and Xujie Si .Which Experiences Are Influential for RL Agents? Efficiently Estimating The Influence of Experiences , by Takuya Hiraoka, Takashi Onishi, Guanquan Wang, and Yoshimasa Tsuruoka .Online Intrinsic Rewards for Decision Making Agents from Large Language Model Feedback , by Qinqing Zheng, Mikael Henaff, Amy Zhang, Aditya Grover, and Brandon Amos .A Finite-Time Analysis of Distributed Q-Learning , by Han-Dong Lim, and Donghwan Lee .Finite-Time Analysis of Minimax Q-Learning , by Narim Jeong, and Donghwan Lee .Collaboration Promotes Group Resilience in Multi-Agent RL , by Ilai Shraga, Guy Azran, Matthias Gerstgrasser, Ofir Abu, Jeffrey Rosenschein, and Sarah Keren .Bayesian Meta-Reinforcement Learning with Laplace Variational Recurrent Networks , by Joery A. de Vries, Jinke He, Mathijs de Weerdt, and Matthijs T. J. Spaan .Foundation Model Self-Play: Open-Ended Strategy Innovation via Foundation Models , by Aaron Dharna, Cong Lu, and Jeff Clune .Action Mapping for Reinforcement Learning in Continuous Environments with Constraints , by Mirco Theile, Lukas Dirnberger, Raphael Trumpp, Marco Caccamo, and Alberto Sangiovanni-Vincentelli .Chargax: A JAX Accelerated EV Charging Simulator , by Koen Ponse, Jan Felix Kleuker, Thomas M. Moerland, and Aske Plaat .Effect of a slowdown correlated to the current state of the environment on an asynchronous learning architecture , by Idriss Abdallah, Laurent CIARLETTA, Patrick HENAFF, Jonathan Champagne, and Matthieu BONAVENT .Cascade - A sequential ensemble method for continuous control tasks , by Robin Schmöcker, and Alexander Dockhorn .Average-Reward Soft Actor-Critic , by Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, and Rahul V Kulkarni .Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes , by Juan Sebastian Rojas, and Chi-Guhn Lee .Your Learned Constraint is Secretly a Backward Reachable Tube , by Mohamad Qadri, Gokul Swamy, Jonathan Francis, Michael Kaess, and Andrea Bajcsy .Improved Regret Bound for Safe Reinforcement Learning via Tighter Cost Pessimism and Reward Optimism , by Kihyun Yu, Duksang Lee, William Overman, and Dabeen Lee .Offline vs. Online Learning in Model-based RL: Lessons for Data Collection Strategies , by Jiaqi Chen, Ji Shi, Cansu Sancaktar, Jonas Frey, and Georg Martius .Uncertainty Prioritized Experience Replay , by Rodrigo Antonio Carrasco-Davis, Sebastian Lee, Claudia Clopath, and Will Dabney .RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$ , by Abhinav Bhatia, Samer B. Nashed, and Shlomo Zilberstein .Pareto Optimal Learning from Preferences with Hidden Context , by Ryan Bahlous-Boldi, Li Ding, Lee Spector, and Scott Niekum .WOFOSTGym: A Crop Simulator for Learning Annual and Perennial Crop Management Strategies , by William Solow, Sandhya Saisubramanian, and Alan Fern .When and Why Hyperbolic Discounting Matters for Reinforcement Learning Interventions , by Ian M. Moore, Eura Nofshin, Siddharth Swaroop, Susan Murphy, Finale Doshi-Velez, and Weiwei Pan .Reinforcement Learning from Human Feedback with High-Confidence Safety Guarantees , by Yaswanth Chittepu, Blossom Metevier, Will Schwarzer, Austin Hoag, Scott Niekum, and Philip S. Thomas .AVID: Adapting Video Diffusion Models to World Models , by Marc Rigter, Tarun Gupta, Agrin Hilmkil, and Chao Ma .Non-Stationary Latent Auto-Regressive Bandits , by Anna L. Trella, Walter H. Dempsey, Asim Gazi, Ziping Xu, Finale Doshi-Velez, and Susan Murphy .Hierarchical Multi-agent Reinforcement Learning for Cyber Network Defense , by Aditya Vikram Singh, Ethan Rathbun, Emma Graham, Lisa Oakley, Simona Boboila, Peter Chin, and Alina Oprea .The Confusing Instance Principle for Online Linear Quadratic Control , by Waris Radji, and Odalric-Ambrym Maillard .Drive Fast, Learn Faster: On-Board RL for High Performance Autonomous Racing , by Benedict Hildisch, Edoardo Ghignone, Nicolas Baumann, Cheng Hu, Andrea Carron, and Michele Magno .Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models , by Kefan Song, Jin Yao, Runnan Jiang, Rohan Chandra, and Shangtong Zhang .Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget , by Dengwang Tang, Rahul Jain, Ashutosh Nayyar, and Pierluigi Nuzzo .Quantitative Resilience Modeling for Autonomous Cyber Defense , by Xavier Cadet, Simona Boboila, Edward Koh, Peter Chin, and Alina Oprea .Efficient Information Sharing for Training Decentralized Multi-Agent World Models , by Xiaoling Zeng, and Qi Zhang .Recursive Reward Aggregation , by Yuting Tang, Yivan Zhang, Johannes Ackermann, Yu-Jie Zhang, Soichiro Nishimori, and Masashi Sugiyama .A Finite-Sample Analysis of an Actor-Critic Algorithm for Mean-Variance Optimization in a Discounted MDP , by Tejaram Sangadi, Prashanth L. A., and Krishna Jagannathan .Impoola: The Power of Average Pooling for Image-based Deep Reinforcement Learning , by Raphael Trumpp, Ansgar Schäfftlein, Mirco Theile, and Marco Caccamo .Fast Adaptation with Behavioral Foundation Models , by Harshit Sikchi, Andrea Tirinzoni, Ahmed Touati, Yingchen Xu, Anssi Kanervisto, Scott Niekum, Amy Zhang, Alessandro Lazaric, and Matteo Pirotta .Multi-Task Reinforcement Learning Enables Parameter Scaling , by Reginald McLean, Evangelos Chatzaroulas, J K Terry, Isaac Woungang, Nariman Farsad, and Pablo Samuel Castro .Eau De $Q$-Network: Adaptive Distillation of Neural Networks in Deep Reinforcement Learning , by Théo Vincent, Tim Faust, Yogesh Tripathi, Jan Peters, and Carlo D'Eramo .Disentangling Recognition and Decision Regrets in Image-Based Reinforcement Learning , by Alihan Hüyük, Arndt Ryo Koblitz, Atefeh Mohajeri Moghaddam, and Matthew Andrews .Learning to Explore in Diverse Reward Settings via Temporal-Difference-Error Maximization , by Sebastian Griesbach, and Carlo D'Eramo .Nonparametric Policy Improvement in Continuous Action Spaces via Expert Demonstrations , by Agustin Castellano, Sohrab Rezaei, Jared Markowitz, and Enrique Mallada .DisDP: Robust Imitation Learning via Disentangled Diffusion Policies , by Pankhuri Vanjani, Paul Mattes, Xiaogang Jia, Vedant Dave, and Rudolf Lioutikov .Mitigating Goal Misgeneralization via Minimax Regret , by Karim Abdel Sadek, Matthew Farrugia-Roberts, Usman Anwar, Hannah Erlebach, Christian Schroeder de Witt, David Krueger, and Michael D Dennis .Long-Horizon Planning with Predictable Skills , by Nico Gürtler, and Georg Martius .HANQ: Hypergradients, Asymmetry, and Normalization for Fast and Stable Deep $Q$-Learning , by Braham Snyder, and Chen-Yu Wei .Benchmarking Massively Parallelized Multi-Task Reinforcement Learning for Robotics Tasks , by Viraj Joshi, Zifan Xu, Bo Liu, Peter Stone, and Amy Zhang .Optimal discounting for offline input-driven MDP , by Randy Lefebvre, and Audrey Durand .Make the Pertinent Salient: Task-Relevant Reconstruction for Visual Control with Distractions , by Kyungmin Kim, JB Lanier, and Roy Fox .Reinforcement Learning for Human-AI Collaboration via Probabilistic Intent Inference , by Yuxin Lin, Seyede Fatemeh Ghoreishi, Tian Lan, and Mahdi Imani .PufferLib 2.0: Reinforcement Learning at 1M steps/s , by Joseph Suarez .Uncovering RL Integration in SSL Loss: Objective-Specific Implications for Data-Efficient RL , by Ömer Veysel Çağatan, and Baris Akgun .Benchmarking Partial Observability in Reinforcement Learning with a Suite of Memory-Improvable Domains , by Ruo Yu Tao, Kaicheng Guo, Cameron Allen, and George Konidaris .Rectifying Regression in Reinforcement Learning , by Alex Ayoub, David Szepesvari, Alireza Bakhtiari, Csaba Szepesvari, and Dale Schuurmans .High-Confidence Policy Improvement from Human Feedback , by Hon Tik Tse, Philip S. Thomas, and Scott Niekum .Adaptive Reward Sharing to Enhance Learning in the Context of Multiagent Teams , by Kyle Tilbury, and David Radke .MixUCB: Enhancing Safe Exploration in Contextual Bandits with Human Oversight , by Jinyan Su, Rohan Banerjee, Jiankai Sun, Wen Sun, and Sarah Dean .Efficient Morphology-Aware Policy Transfer to New Embodiments , by Michael Przystupa, Hongyao Tang, Glen Berseth, Mariano Phielipp, Santiago Miret, Martin Jägersand, and Matthew E. Taylor .Understanding Learned Representations and Action Collapse in Visual Reinforcement Learning , by Xi Chen, Zhihui Zhu, and Andrew Perrault .Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions , by Ayush Jain, Norio Kosaka, Xinhu Li, Kyung-Min Kim, Erdem Biyik, and Joseph J Lim .Leveraging priors on distribution functions for multi-arm bandits , by Sumit Vashishtha, and Odalric-Ambrym Maillard .ProtoCRL: Prototype-based Network for Continual Reinforcement Learning , by Michela Proietti, Peter R. Wurman, Peter Stone, and Roberto Capobianco .Finer Behavioral Foundation Models via Auto-Regressive Features and Advantage Weighting , by Edoardo Cetin, Ahmed Touati, and Yann Ollivier .Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning , by Subhojyoti Mukherjee, Josiah P. Hanna, Qiaomin Xie, and Robert D Nowak .Multi-task Representation Learning for Fixed Budget Pure-Exploration in Linear and Bilinear Bandits , by Subhojyoti Mukherjee, Qiaomin Xie, and Robert D Nowak .Offline Reinforcement Learning with Domain-Unlabeled Data , by Soichiro Nishimori, Xin-Qiang Cai, Johannes Ackermann, and Masashi Sugiyama .Multi-Agent Reinforcement Learning for Inverse Design in Photonic Integrated Circuits , by Yannik Mahlau, Maximilian Schier, Christoph Reinders, Frederik Schubert, Marco Bügling, and Bodo Rosenhahn .Syllabus: Portable Curricula for Reinforcement Learning Agents , by Ryan Sullivan, Ryan Pégoud, Ameen Ur Rehman, Xinchen Yang, Junyun Huang, Aayush Verma, Nistha Mitra, and John P Dickerson .Exploration-Free Reinforcement Learning with Linear Function Approximation , by Luca Civitavecchia, and Matteo Papini .SPEQ: Offline Stabilization Phases for Efficient Q-Learning in High Update-To-Data Ratio Reinforcement Learning , by Carlo Romeo, Girolamo Macaluso, Alessandro Sestini, and Andrew D. Bagdanov .Value Bonuses using Ensemble Errors for Exploration in Reinforcement Learning , by Abdul Wahab, Raksha Kumaraswamy, and Martha White .Gaussian Process Q-Learning for Finite-Horizon Markov Decision Processes , by Maximilian Bloor, Tom Savage, Calvin Tsay, Antonio Del rio chanona, and Max Mowbray .On the Effect of Regularization in Policy Mirror Descent , by Jan Felix Kleuker, Aske Plaat, and Thomas M. Moerland .Concept-Based Off-Policy Evaluation , by Ritam Majumdar, Jack Teversham, and Sonali Parbhoo .Investigating the Utility of Mirror Descent in Off-policy Actor-Critic , by Samuel Neumann, Jiamin He, Adam White, and Martha White .Hybrid Classical/RL Local Planner for Ground Robot Navigation , by Vishnu Dutt Sharma, Jeongran Lee, Matthew Andrews, and Ilija Hadžić .How Should We Meta-Learn Reinforcement Learning Algorithms? , by Alexander David Goldie, Zilin Wang, Jaron Cohen, Jakob Nicolaus Foerster, and Shimon Whiteson .Seldonian Reinforcement Learning for Ad Hoc Teamwork , by Edoardo Zorzi, Alberto Castellini, Leonidas Bakopoulos, Georgios Chalkiadakis, and Alessandro Farinelli .Offline Reinforcement Learning with Wasserstein Regularization via Optimal Transport Maps , by Motoki Omura, Yusuke Mukuta, Kazuki Ota, Takayuki Osa, and Tatsuya Harada .Intrinsically Motivated Discovery of Temporally Abstract Graph-based Models of the World , by Akhil Bagaria, Anita De Mello Koch, Rafael Rodriguez-Sanchez, Sam Lobel, and George Konidaris .An Optimisation Framework for Unsupervised Environment Design , by Nathan Monette, Alistair Letcher, Michael Beukman, Matthew Thomas Jackson, Alexander Rutherford, Alexander David Goldie, and Jakob Nicolaus Foerster .Epistemically-guided forward-backward exploration , by Núria Armengol Urpí, Marin Vlastelica, Georg Martius, and Stelian Coros .Rethinking the Foundations for Continual Reinforcement Learning , by Esraa Elelimy, David Szepesvari, Martha White, and Michael Bowling .Modelling human exploration with light-weight meta reinforcement learning algorithms , by Thomas D. Ferguson, Alona Fyshe, and Adam White .Zero-Shot Reinforcement Learning Under Partial Observability , by Scott Jeen, Tom Bewley, and Jonathan Cullen .Building Sequential Resource Allocation Mechanisms without Payments , by Sihan Zeng, Sujay Bhatt, Alec Koppel, and Sumitra Ganesh .From Explainability to Interpretability: Interpretable Reinforcement Learning Via Model Explanations , by Peilang Li, Umer Siddique, and Yongcan Cao .Joint-Local Grounded Action Transformation for Sim-to-Real Transfer in Multi-Agent Traffic Control , by Justin Turnau, Longchao Da, Khoa Vo, Ferdous Al Rafi, Shreyas Bachiraju, Tiejin Chen, and Hua Wei .Sampling from Energy-based Policies using Diffusion , by Vineet Jain, Tara Akhound-Sadegh, and Siamak Ravanbakhsh .Multiple-Frequencies Population-Based Training , by Waël Doulazmi, Auguste Lehuger, Marin Toromanoff, Valentin Charraut, Thibault Buhet, and Fabien Moutarde .TransAM: Transformer-Based Agent Modeling for Multi-Agent Systems via Local Trajectory Encoding , by Conor Wallace, Umer Siddique, and Yongcan Cao .Towards Improving Reward Design in RL: A Reward Alignment Metric for RL Practitioners , by Calarina Muslimani, Kerrick Johnstonbaugh, Suyog Chandramouli, Serena Booth, W. Bradley Knox, and Matthew E. Taylor .Optimistic critics can empower small actors , by Olya Mastikhina, Dhruv Sreenivas, and Pablo Samuel Castro .PAC Apprenticeship Learning with Bayesian Active Inverse Reinforcement Learning , by Ondrej Bajgar, Dewi Sid William Gould, Jonathon Liu, Alessandro Abate, Konstantinos Gatsis, and Michael A Osborne .AVG-DICE: Stationary Distribution Correction by Regression , by Fengdi Che, Bryan Chan, Chen Ma, and A. Rupam Mahmood .V-Max: A RL Framework for Autonomous Driving , by Valentin Charraut, Waël Doulazmi, Thomas Tournaire, and Thibault Buhet .Offline Action-Free Learning of Ex-BMDPs by Comparing Diverse Datasets , by Alexander Levine, Peter Stone, and Amy Zhang .One Goal, Many Challenges: Robust Preference Optimization Amid Content-Aware, Multi-Source Noise , by Amirabbas Afzali, Amirhossein Afsharrad, Seyed Shahabeddin Mousavi, and Sanjay Lall .A Timer-Based Hybrid Supervisor for Robust, Chatter-Free Policy Switching , by Jan de Priester, and Ricardo Sanfelice .Deep Reinforcement Learning with Gradient Eligibility Traces , by Esraa Elelimy, Brett Daley, Andrew Patterson, Marlos C. Machado, Adam White, and Martha White .On Slowly-varying Non-stationary Bandits , by Ramakrishnan K, and Aditya Gopalan .Focused Skill Discovery: Learning to Control Specific State Variables while Minimizing Side Effects , by Jonathan Colaço Carr, Qinyi Sun, and Cameron Allen .Goals vs. Rewards: A Preliminary Comparative Study of Objective Specification Mechanisms , by Septia Rani, Serena Booth, and Sarath Sreedharan .An Analysis of Action-Value Temporal-Difference Methods That Learn State Values , by Brett Daley, Prabhat Nagarajan, Martha White, and Marlos C. Machado .PEnGUiN: Partially Equivariant Graph NeUral Networks for Sample Efficient MARL , by Joshua McClellan, Greyson Brothers, Furong Huang, and Pratap Tokekar .Shaping Laser Pulses with Reinforcement Learning , by Francesco Capuano, Davorin Peceli, and Gabriele Tiboni .Reinforcement Learning with Adaptive Temporal Discounting , by Sahaj Singh Maini, and Zoran Tiganj .Human-Level Competitive Pokémon via Scalable Offline Reinforcement Learning with Transformers , by Jake Grigsby, Yuqi Xie, Justin Sasek, Steven Zheng, and Yuke Zhu .Adaptive Submodular Policy Optimization , by Branislav Kveton, Anup Rao, Viet Dac Lai, Nikos Vlassis, and David Arbour .Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning , by Umer Siddique, Peilang Li, and Yongcan Cao .Representation Learning and Skill Discovery with Empowerment , by Andrew Levy, Alessandro G Allievi, and George Konidaris .Empirical Bound Information-Directed Sampling for Norm-Agnostic Bandits , by Piotr M. Suder, and Eric Laber .Thompson Sampling for Constrained Bandits , by Rohan Deb, Mohammad Ghavamzadeh, and Arindam Banerjee .AI in a vat: Fundamental limits of efficient world modelling for agent sandboxing and interpretability , by Fernando Rosas, Alexander Boyd, and Manuel Baltieri .Achieving Limited Adaptivity for Multinomial Logistic Bandits , by Sukruta Prakash Midigeshi, Tanmay Goyal, and Gaurav Sinha .