Attention-based Reinforcement Learning for Combinatorial Optimization: Application to Job Shop Scheduling Problem
Authors: Jaehjin Lee, Seho Kee, Mani Janakiraman, George Bunger
Job shop scheduling problems are one of the most important and challenging combinatorial optimization problems that have been tackled mainly by exact or approximate solution approaches. However, finding an exact solution can be infeasible for real-world problems, and even with an approximate solution approach, it can require a prohibitive amount of time to find a near-optimal solution, and the found solutions are not applicable to new problems in general. To address these challenges, we propose an attention-based reinforcement learning method for the class of job shop scheduling problems by integrating a policy gradient reinforcement learning with a modified transformer architecture. An important result is that our trained learners in the proposed method can be reused to solve large-scale problems not used in training and demonstrate that our proposed approach outperforms the results in recent studies and widely adopted heuristic rules.
FM3Q: Factorized Multi-Agent Minimax Q-Learning for Two-Team Zero-Sum Markov Games
Authors: Guangzheng Hu, Yuanheng Zhu, Haoran Li, Dongbin Zhao
Many real-world applications involve some agents that fall into two teams, with payoffs that are equal within the same team but of opposite sign across the opponent team. The so-called two-team zero-sum Markov games (2t0sMGs) can be resolved with reinforcement learning in recent years. However, existing methods are thus inefficient in light of insufficient consideration of intra-team credit assignment, data utilization and computational intractability. In this paper, we propose the individual-global-minimax (IGMM) principle to ensure the coherence between two-team minimax behaviors and the individual greedy behaviors through Q functions in 2t0s.
NovelGym: A Flexible Ecosystem for Hybrid Planning and Learning Agents Designed for Open Worlds
Authors: Shivam Goel, Yichen Wei, Klara Chura, Matthias Scheutz, Panagiotis Lymperopoulous, Jivko Sinapov
As AI agents leave the lab and venture into the real world as autonomous vehicles, delivery robots, and cooking robots, it is increasingly necessary to design and comprehensively evaluate algorithms that tackle the “open-world”. To this end, we introduce NovelGym, a flexible and adaptable ecosystem designed to simulate gridworld environments, serving as a robust platform for benchmarking reinforcement learning (RL) and hybrid planning and learning agents in open-world contexts. The modular architecture of NovelGym facilitates rapid creation and modification of task environments, including multi-agent scenarios, with multiple environment transformations, thus providing a dynamic testbed for researchers to develop open-world AI agents.
RadDQN: a Deep Q Learning-based Architecture for Finding Time-efficient Minimum Radiation Exposure Pathway
Authors: Biswajit Sadhu, Trijit Sadhu, S. Anand
Recent advancements in deep reinforcement learning (DRL) techniques have sparked its multifaceted applications in the automation sector. Managing complex decision-making problems with DRL encourages its use in the nuclear industry for tasks such as optimizing radiation exposure to the personnel during normal operating conditions and potential accidental scenarios. However, the lack of efficient reward function and effective exploration strategy thwarted its implementation in the development of radiation-aware autonomous unmanned aerial vehicle (UAV) for achieving maximum radiation protection. Here, in this article, we address these intriguing issues and introduce a deep Q-learning based architecture (RadDQN) that operates on a radiation-aware reward function to
Building Open-Ended Embodied Agent via Language-Policy Bidirectional Adaptation
Authors: Shaopeng Zhai, Joe Wang, Tianyi Zhang, Fuxian Huang, Qi Zhang, Ming Zhou, Jing Hou, Yu Qiao, Yu Liu
Building embodied agents on integrating Large Language Models (LLMs) and Reinforcement Learning (RL) have revolutionized human-AI interaction: researchers can now leverage language instructions to plan decision-making for open-ended tasks. However, existing research faces challenges in meeting the requirement of open-endedness. They typically either train LLM/RL models to adapt to a fixed counterpart, limiting exploration of novel skills and hindering the efficacy of human-AI interaction. To this end, we present OpenPAL, a co-training framework comprising two stages: (1) fine-tuning a pre-trained LLM to translate human instructions into goals for planning, and (
Active Inference as a Model of Agency
Authors: Lancelot Da Costa, Dominic Zhao, Samuel Tenka, Noor Sajid
Is there a canonical way to think of agency beyond reward maximisation? In this paper, we show that any type of behaviour complying with physically sound assumptions about how macroscopic biological agents interact with the world canonically integrates exploration and exploitation in the sense of minimising risk and ambiguity about states of the world. This description, known as active inference, refines the free energy principle, a popular descriptive framework for action and perception originating in neuroscience. Active inference provides a normative Bayesian framework to simulate and model agency that is widely used in behavioural neuroscience, reinforcement learning (RL) and robotics. The usefulness of active inference for RL is three-fold.
Finite Time Performance Guarantees for Regularized Q-Learning with Linear Function Approximation
Authors: Jiachen Xi, Alfredo Garcia, Petar Momčilović
Regularized Markov Decision Processes serve as models of sequential decision making under uncertainty wherein the decision maker has limited information processing capacity and/or aversion to model ambiguity. With functional approximation, the convergence properties of learning algorithms for regularized MDPs (e.g. soft Q-learning) are not well understood because the composition of the regularized Bellman operator and a projection onto the span of basis vectors is not a contraction with respect to any norm. In this paper, we consider a bi-level optimization formulation of regularized Q-learning with linear functional approximation. The lower level optimization problem aims to identify a value function approximation that satisfies Bellman’s recursive
Principal-Agent Reward Shaping in MDPs
Authors: Omer Ben-Porat*, Yishay Mansour†, Michael Moshkovitz‡, Boaz Taitler§
Principal-agent problems arise when one party acts on behalf of another, leading to conflicts of interest. The economic literature has extensively studied principal-agent problems, and recent work has extended this to more complex scenarios such as Markov Decision Processes (MDPs). In this paper, we further explore this line of research by investigating how reward shaping under budget constraints can improve the principal’s utility. We study a two-player Stackelberg game where the principal and the agent have different reward functions, and the agent chooses an MDP policy for both players. The principal offers an additional reward to the agent, and the agent picks their policy selfishly to maximize their