Session A, Monday, Oct 28
- Paper 3 - A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays (Saeed Masoudian, Julian Zimmert, Yevgeny Seldin)
- Paper 4 - No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO (Skander Moalla, Andrea Miele, Razvan Pascanu, Caglar Gulcehre)
- Paper 5 - Adaptive $Q$-Network: On-the-fly Target Selection for Deep Reinforcement Learning (Théo Vincent, Fabian Wahren, Jan Peters, Boris Belousov, Carlo D'Eramo)
- Paper 7 - Individual Regret in Cooperative Stochastic Multi-Armed Bandits over Communication Graph (Idan Barnea, Tal Lancewicki, Yishay Mansour)
- Paper 11 - Multi-Task Reinforcement Learning with Mixture of Orthogonal Experts (Ahmed Hendawy, Jan Peters, Carlo D'Eramo)
- Paper 15 - Regret Guarantees for Adversarial Contextual Bandits with Delayed Feedback (Liad Erez, Orin Levy, Yishay Mansour)
- Paper 20 - Augmented Bayesian Policy Search (Mahdi Kallel, Debabrota Basu, Riad Akrour, Carlo D'Eramo)
- Paper 22 - Approximate information maximization for bandit games (Alex Barbier Chebbah, Christian L Vestergaard, Masson, Etienne Boursier)
- Paper 26 - Model-based Sparse Communication in Multi-agent Reinforcement Learning (Shuai Han, Mehdi Dastani, Shihan Wang)
- Paper 29 - Divide and Conquer: Provably Unveiling the Pareto Front with Multi-Objective Reinforcement Learning (Willem Röpke, Mathieu Reymond, Patrick Mannion, Diederik M Roijers, Ann Nowe, Roxana Rădulescu)
- Paper 31 - Exploring Pessimism and Optimism Dynamics in Deep Reinforcement Learning (Bahareh Tasdighi, Nicklas Werge, Yi-Shan Wu, Melih Kandemir)
- Paper 37 - Bayesian Meta-Reinforcement Learning with Laplace Variational Recurrent Networks (Joery A. de Vries, Jinke He, Mathijs de Weerdt, Matthijs T. J. Spaan)
- Paper 40 - AFU: Actor-Free critic Updates in off-policy RL for continuous control (Nicolas Perrin-Gilbert)
- Paper 49 - Model-Based Transfer Learning for Contextual Reinforcement Learning (Jung-Hoon Cho, Vindula Jayawardana, Sirui Li, Cathy Wu)
- Paper 52 - Thompson Sampling-like Algorithms for Stochastic Rising Rested Bandits (Marco Fiandri , Alberto Maria Metelli, Francesco Trovò)
- Paper 55 - Bandit Pareto Set Identification in a Multi-Output Linear Model (Cyrille Kone, Emilie Kaufmann, Laura Richert)
- Paper 61 - Bisimulation Metrics are Optimal Transport Distances, and Can be Computed Efficiently (Sergio Calo Oliveira, Anders Jonsson, Gergely Neu, Ludovic Schwartz, Javier Segovia-Aguas)
- Paper 64 - Model-Based Meta-Reinforcement Learning for Hyperparameter Optimization (Jeroen Albrechts, Hugo Max Martin, Maryam Tavakol)
- Paper 73 - Distributed Constrained Multi-Agent Reinforcement Learning with Consensus and Networked Communication (Santiago Amaya-Corredor, Miguel Calvo-Fullana, Anders Jonsson)
- Paper 77 - Differentially Private Deep Model-Based Reinforcement Learning (Alexandre Rio, Merwan Barlier, Igor Colin, Albert Thomas)
- Paper 82 - Sum-Max Submodular Bandits (Stephen Pasteris, Alberto Rumi, Fabio Vitale, Nicolò Cesa-Bianchi)
- Paper 83 - Adversarial Contextual Bandits Go Kernelized (Gergely Neu, Julia Olkhovskaya, Sattar Vakili)
- Paper 84 - Deterministic Exploration via Stationary Bellman Error Maximization (Sebastian Griesbach, Carlo D'Eramo)
- Paper 86 - Maximum Entropy On-Policy Actor-Critic via Entropy Advantage Estimation (Jean Seong Bjorn Choe, Jong-Kook Kim)
- Paper 88 - Interpretable and Editable Programmatic Tree Policies for Reinforcement Learning (Hector Kohler, Quentin Delfosse, Riad Akrour, Kristian Kersting, Philippe Preux)
- Paper 90 - CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity (Aditya Bhatt, Daniel Palenicek, Boris Belousov, Max Argus, Artemij Amiranashvili, Thomas Brox, Jan Peters)
- Paper 94 - The Whys and Hows of Active Exploration in Model-Based Reinforcement Learning (Alberto Caron, Chris Hicks, Vasilios Mavroudis)
- Paper 102 - Explore-Go: Leveraging Exploration for Generalisation in Deep Reinforcement Learning (Max Weltevrede, Felix Kaubek, Matthijs T. J. Spaan, Wendelin Boehmer)
- Paper 103 - Robust Best-of-Both-Worlds Gap Estimators Based on Importance-Weighted Sampling (Sarah Clusiau, Saeed Masoudian, Yevgeny Seldin)
- Paper 107 - An Attentive Approach for Building Partial Reasoning Agents from Pixels (Safa Alver, Doina Precup)
- Paper 109 - Learning to Explore with Lagrangians for Bandits under Unknown Constraints (Udvas Das, Debabrota Basu)
- Paper 110 - A Minimax-Bayes Approach to Ad Hoc Teamwork (Victor Villin, Christos Dimitrakakis, Thomas Kleine Buening)
- Paper 118 - Revisiting On-Policy Deep Reinforcement Learning (Mahdi Kallel, Samuele Tosatto, Carlo D'Eramo)
- Paper 119 - Directed Exploration in Reinforcement Learning from Linear Temporal Logic (Marco Bagatella, Andreas Krause, Georg Martius)
- Paper 120 - Linear Bandits with Memory (Giulia Clerici, Pierre Laforgue, Nicolò Cesa-Bianchi)
- Paper 124 - Understanding the Gaps in Satisficing Bandits (Chloé Rouyer, Ronald Ortner, Peter Auer)
- Paper 135 - Denoised Predictive Imagination: An Information-theoretic approach for learning World Models (Vedant Dave, Elmar Rueckert)
- Paper 144 - Stochastic Q-learning for Large Discrete Action Spaces (Fares Fourati, Vaneet Aggarwal, Mohamed-Slim Alouini)
- Paper 143 - Environment Complexity and Nash Equilibria in a Sequential Social Dilemma (Mustafa Yasir, Andrew Howes, Vasilios Mavroudis, Chris Hicks)
- Paper 148 - Combining Automated Optimisation of Hyperparameters and Reward Shape (Julian Dierkes, Emma Cramer, Sebastian Trimpe, Holger Hoos)
Session B, Tuesday, Oct 29
- Paper 1 - Offline RL via Feature-Occupancy Gradient Ascent (Gergely Neu, Nneka Okolo)
- Paper 2 - Learning to Steer Markovian Agents under Model Uncertainty (Jiawei Huang, Vinzenz Thoma, Zebang Shen, Heinrich H. Nax, Niao He)
- Paper 6 - Preference Elicitation for Offline Reinforcement Learning (Alizée Pace, Bernhard Schölkopf, Gunnar Ratsch, Giorgia Ramponi)
- Paper 13 - Feudal Graph Reinforcement Learning (Tommaso Marzi, Arshjot Singh Khehra, Andrea Cini, Cesare Alippi)
- Paper 16 - Leveraging diverse offline data in POMDPs with unobserved confounders (Oussama Azizi, Philip Boeken, Onno Zoeter, Frans A Oliehoek, Matthijs T. J. Spaan)
- Paper 17 - Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes (Asaf Cassel, Aviv Rosenberg)
- Paper 21 - Policy Gradient Methods with Adaptive Policy Spaces (Gianmarco Tedeschi, Matteo Papini, Alberto Maria Metelli, Marcello Restelli)
- Paper 23 - Image-Based Dataset Representations for Predicting Learning Performance in Offline RL (Enrique Mateos-Melero, Miguel Iglesias Alcázar, Raquel Fuentetaja, Peter Stone, Fernando Fernández)
- Paper 24 - Periodic agent-state based Q-learning for POMDPs (Amit Sinha, Matthieu Geist, Aditya Mahajan)
- Paper 28 - Unbiased Policy Gradient with Random Horizon (Rui Yuan, Andrii Tretynko, Simone Rossi, Thomas Hannagan)
- Paper 33 - APE: An Anti-poaching Multi-Agent Reinforcement Learning Benchmark (Prasanna Maddila, Casellas Eric, Patrick Chabrier, Régis Sabbadin, Meritxell Vinyals)
- Paper 38 - Online Planning in POMDPs with State-Requests (Raphaël Avalos, Eugenio Bargiacchi, Ann Nowe, Diederik Roijers, Frans A Oliehoek)
- Paper 39 - Inferring Behavior-Specific Context Improves Zero-Shot Generalization in Reinforcement Learning (Tidiane Camaret Ndir, André Biedenkapp, Noor Awad)
- Paper 46 - Cyclicity-Regularized Coordination Graphs (Oliver Järnefelt, Mahdi Kallel, Carlo D'Eramo)
- Paper 50 - Creating Multi-Level Skill Hierarchies in Reinforcement Learning (Joshua Benjamin Evans, Özgür Şimşek)
- Paper 53 - World Models Increase Autonomy in Reinforcement Learning (Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat, Edward S. Hu)
- Paper 54 - Direct Advantage Estimation in Partially Observable Environments (Hsiao-Ru Pan, Bernhard Schölkopf)
- Paper 57 - Latent Communication for Zero-shot Stitching in Reinforcement Learning (Antonio Pio Ricciardi, Valentino Maiorca, Luca Moschella, Riccardo Marin, Emanuele Rodolà)
- Paper 60 - Exploration by Learning Diverse Skills through Successor State Measures (Paul-Antoine Le Tolguenec, Yann Besse, Florent Teichteil-Königsbuch, Dennis George Wilson, Emmanuel Rachelson)
- Paper 63 - ARLBench: Flexible and Efficient Benchmarking for Hyperparameter Optimization in Reinforcement Learning (Jannis Becktepe, Julian Dierkes, Carolin Benjamins, Aditya Mohan, David Salinas, Raghu Rajan, Frank Hutter, Holger Hoos, Marius Lindauer, Theresa Eimer)
- Paper 65 - Towards Enhancing Representations in Reinforcement Learning using Relational Structure (Aditya Mohan, Marius Lindauer)
- Paper 67 - Zero-shot cross-modal transfer of Reinforcement Learning policies through a Global Workspace (Léopold Maytié, Benjamin Devillers, Alexandre Arnold, Rufin VanRullen)
- Paper 69 - Learning Memory-Based Policies for Robust POMDPs (Maris F. L. Galesloot, Marnix Suilen, Thiago D. Simão, Steven Carr, Matthijs T. J. Spaan, ufuk topcu, Nils Jansen)
- Paper 70 - Curricula for Learning Robust Policies with Factored State Representations in Changing Environments (Panayiotis Panayiotou, Özgür Şimşek)
- Paper 71 - Impact of Collective Behaviors of Autonomous Vehicles on Urban Traffic Dynamics: A Multi-Agent Reinforcement Learning Approach (Ahmet Onur Akman, Anastasia Psarou, Zoltán György Varga, Grzegorz Jamróz, Rafal Kucharski)
- Paper 76 - Almost Sure Convergence of Stochastic Gradient Methods under Gradient Domination (Simon Weissmann, Sara Klein, Waïss Azizian, Leif Döring)
- Paper 78 - Adaptive Exploration for Data-Efficient General Value Function Evaluations (Arushi Jain, Josiah P. Hanna, Doina Precup)
- Paper 79 - Physics-Informed Model and Hybrid Planning for Efficient Dyna-Style Reinforcement Learning (Zakariae El Asri, Olivier Sigaud, Nicolas Thome)
- Paper 80 - Following Ancestral Footsteps: Co-Designing Morphology and Behaviour with Self-Imitation Learning (Sergio Hernández-Gutiérrez, Ville Kyrki, Kevin Sebastian Luck)
- Paper 81 - Dreaming of Many Worlds: Learning Contextual World Models aids Zero-Shot Generalization (Sai Prasanna, Karim Farid, Raghu Rajan, André Biedenkapp)
- Paper 87 - Earth Observation Satellite Scheduling with Graph Neural Networks (Guillaume Infantes, Antoine Jacquet, Emmanuel Benazera, Stéphanie Roussel, Nicolas Meuleau, Vincent Baudoui, Jonathan Guerra)
- Paper 92 - Generalisation to unseen topologies: Towards control of biological neural network activity (Laurens Engwegen, Daan Brinks, Wendelin Boehmer)
- Paper 96 - Applying Reinforcement Learning to Navigation In Partially Observable Flows (Selim Mecanna, Aurore Loisy, Christophe Eloy)
- Paper 117 - Controller Synthesis from Deep Reinforcement Learning Policies (Florent Delgrange, Guy Avni, Anna Lukina, Christian Schilling, Ann Nowe, Guillermo Perez)
- Paper 128 - Offline Reinforcement Learning with Pessimistic Value Priors (Filippo Valdettaro, Aldo A. Faisal)
- Paper 129 - Contextualized Hybrid Ensemble Q-learning: Learning Fast with Control Priors (Emma Cramer, Bernd Frauenknecht, Ramil Sabirov, Sebastian Trimpe)
- Paper 142 - Private Online Learning in Adversarial MDPs: Full-Information and Bandit (Shaojie Bai, Lanting Zeng, Chengcheng Zhao, Xiaoming Duan, Mohammad Sadegh Talebi, Peng Cheng, Jiming Chen)
- Paper 146 - Robust Chain of Thoughts Preference Optimization (Eugene Choi, Arash Ahmadian, Olivier Pietquin, Matthieu Geist, Mohammad Gheshlaghi Azar)
- Paper 147 - Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning (Mathieu Rita, Florian Strub, Rahma Chaabouni, Paul Michel, Emmanuel Dupoux, Olivier Pietquin)
- Paper 149 - Recurrent Natural Policy Gradient for POMDPs (Semih Cayci, Atilla Eryilmaz)
Session C, Wednesday, Oct 30
- Paper 8 - Value Improved Actor Critic Algorithms (Yaniv Oren, Moritz Akiya Zanger, Pascal R. Van der Vaart, Matthijs T. J. Spaan, Wendelin Boehmer)
- Paper 9 - Online learning in CMDPs with adversarial losses and stochastic hard constraints (Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti)
- Paper 10 - MetaCURL: Non-stationary Concave Utility Reinforcement Learning (Bianca Marin Moreno, Margaux Brégère, Pierre Gaillard, Nadia Oudjane)
- Paper 12 - Reducing Blackwell and Average Optimality to Discounted MDPs via the Blackwell Discount Factor (Julien Grand-Clément, Marek Petrik)
- Paper 14 - Generalized Nested Rollout Policy Adaptation with Limited Repetitions (Tristan Cazenave)
- Paper 18 - The challenge of continuous MDPs: is no-regret learning feasible? (Davide Maran, Alberto Maria Metelli, Matteo Papini, Marcello Restelli)
- Paper 19 - Efficient Rate Optimal Regret for Adversarial Contextual MDPs Using Online Function Approximation (Orin Levy, Alon Cohen, Asaf Cassel, Yishay Mansour)
- Paper 25 - EVaR Optimization in MDPs with Total Reward Criterion (Xihong Su, Marek Petrik, Julien Grand-Clément)
- Paper 27 - Time-Efficient Reinforcement Learning with Stochastic Stateful Policies (Firas Al-Hafez, Guoping Zhao, Jan Peters, Davide Tateo)
- Paper 30 - Bootstrapping Expectiles in Reinforcement Learning (Pierre Clavier, Emmanuel Rachelson, Erwan Le Pennec, Matthieu Geist)
- Paper 34 - Isoperimetry is All We Need: Langevin Posterior Sampling for RL (Emilio Jorge, Christos Dimitrakakis, Debabrota Basu)
- Paper 35 - Imitation Learning in Discounted Linear MDP without exploration assumptions (Luca Viano, Stratis Skoulakis, Volkan Cevher)
- Paper 36 - Epistemic Bellman Operators (Pascal R. Van der Vaart, Matthijs T. J. Spaan, Neil Yorke-Smith)
- Paper 42 - Rate-Optimal Policy Optimization for Linear Markov Decision Processes (Uri Sherman, Alon Cohen, Tomer Koren, Yishay Mansour)
- Paper 45 - Enhancing Exploration via Off-Reward Dynamic Reference Reinforcement Learning (Yamen Habib, Dmytro Grytskyy, Rubén Moreno-Bote)
- Paper 48 - Explaining Reinforcement Learning with Shapley Values (Daniel Beechey, Thomas M. S. Smith, Özgür Şimşek)
- Paper 51 - Backward explanations via redefinition of predicates (Léo Saulières, Martin C. Cooper, Florence Dupin de Saint-Cyr)
- Paper 56 - State Abstraction Discovery from Progressive Disaggregation Methods (Orso Forghieri, Hind Castel, Emmanuel Hyon, Erwan Le Pennec)
- Paper 68 - Adaptive Distributional Double Q-learning (Leif Döring, Maximilian Birr, Mihail Bîrsan)
- Paper 74 - Finding good policies in average-reward Markov Decision Processes without prior knowledge (Adrienne Tuynman, Emilie Kaufmann, Rémy Degenne)
- Paper 75 - Kernel-Based Function Approximation for Average Reward Reinforcement Learning: An Optimist No-Regret Algorithm (Sattar Vakili, Julia Olkhovskaya)
- Paper 89 - Time-Constrained Robust MDPs (Adil Zouitine, David Bertoin, Pierre Clavier, Matthieu Geist, Emmanuel Rachelson)
- Paper 91 - RRLS : Robust Reinforcement Learning Suite (Adil Zouitine, David Bertoin, Pierre Clavier, Matthieu Geist, Emmanuel Rachelson)
- Paper 93 - Learning mirror maps in policy mirror descent (Carlo Alfano, Sebastian Rene Towers, Silvia Sapora, Chris Lu, Patrick Rebeschini)
- Paper 95 - A Conservative Approach for Few-Shot Transfer in Off-Dynamics Reinforcement Learning (Paul Daoudi, Christophe Prieur, Bogdan Robu, Merwan Barlier, Ludovic Dos Santos)
- Paper 97 - Evidence on the regularization properties of Maximum-Entropy Reinforcement Learning (Remy Hosseinkhan Boucher, Lionel Mathelin, Onofrio Semeraro)
- Paper 98 - Truly No-Regret Learning in Constrained MDPs (Adrian Müller, Pragnya Alatur, Volkan Cevher, Giorgia Ramponi, Niao He)
- Paper 100 - A Distributional Analogue to the Successor Representation (Harley Wiltzer, Jesse Farebrother, Arthur Gretton, Yunhao Tang, Andre Barreto, Will Dabney, Marc G Bellemare, Mark Rowland)
- Paper 101 - Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods (Sara Klein, Simon Weissmann, Leif Döring)
- Paper 104 - Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning (Yuhui Wang, Qingyuan Wu, Weida Li, Dylan R. Ashley, Francesco Faccio, Chao Huang, Jürgen Schmidhuber)
- Paper 105 - Functional Acceleration for Policy Mirror Descent (Veronica Chelu, Doina Precup)
- Paper 108 - A Look at Value-Based Decision-Time vs. Background Planning Methods Across Different Settings (Safa Alver, Doina Precup)
- Paper 127 - Sample-efficient reinforcement learning for environments with rare high-reward states (Daniel G Mastropietro, Urtzi Ayesta, Matthieu Jonckheere)
- Paper 130 - Can Decentralized Q-learning learn to collude? (Janusz M Meylahn)
- Paper 131 - Trust the Model Where It Trusts Itself - Model-Based Actor-Critic with Uncertainty-Aware Rollout Adaption (Bernd Frauenknecht, Artur Eisele, Devdutt Subhasish, Friedrich Solowjow, Sebastian Trimpe)
- Paper 133 - Latent Assistance Networks: Rediscovering Hyperbolic Tangents in RL (Jacob Eeuwe Kooi, Mark Hoogendoorn, Vincent Francois-Lavet)
- Paper 134 - Viability of Future Actions: Robust Reinforcement Learning via Entropy Regularization (Pierre-François, Massiani, Alexander von Rohr, Lukas Haverbeck, Sebastian Trimpe)
- Paper 136 - Dual-Force: Enhanced Offline Diversity Maximization under Imitation Constraints (Pavel Kolev, Marin Vlastelica, Georg Martius)
- Paper 138 - Using a Learned Policy Basis to Optimally Solve Reward Machines (Guillermo Infante, David Kuric, Vicenç Gómez, Anders Jonsson, Herke van Hoof)
- Paper 139 - Tractable Offline Learning of Regular Decision Processes (Ahana Deb, Roberto Cipollone, Anders Jonsson, Alessandro Ronca, Mohammad Sadegh Talebi)