Attention-based Reinforcement Learning for Combinatorial Optimization: Application to Job Shop Scheduling Problem

Jaehjin Lee, Seho Kee, Mani Janakiraman, George Bunger

Abstract

Job shop scheduling problems are one of the most important and challenging combinatorial optimization problems that have been tackled mainly by exact or approximate solution approaches. However, finding an exact solution can be infeasible for real-world problems, and even with an approximate solution approach, it can require a prohibitive amount of time to find a near-optimal solution, and the found solutions are not applicable to new problems in general. To address these challenges, we propose an attention-based reinforcement learning method for the class of job shop scheduling problems by integrating a policy gradient reinforcement learning with a modified transformer architecture. An important result is that our trained learners in the proposed method can be reused to solve large-scale problems not used in training and demonstrate that our proposed approach outperforms the results in recent studies and widely adopted heuristic rules.

Core problem

The core problem is the inefficiency of traditional methods in solving Job Shop Scheduling Problems (JSSP), particularly for large-scale real-world scenarios. These methods are either computationally unfeasible, time-consuming, or lack generalizability, requiring repeated optimization efforts for new problems.

Key findings and Contribution

ARLS outperforms previous methods in synthetic datasets.
ARLS generalizes well to unseen problem sizes.
ARLS achieves superior performance compared to widely adopted heuristic rules.
Modified transformer architecture with masked multi-head attention enhances focus on relevant operations.
Multi-trajectory training strategy improves robustness and diversity of solutions.

Limitations

Training data size is limited to small-scale problems (6x6).
Transformer architecture is relatively simple (few layers).
Benchmark datasets have some instances where optimal makespans are unknown.
Training is based on synthetic data.
Comparison with some prior studies is limited due to differences in training sizes and methods.

Key quotes

Our attention-based, reinforcement learning scheduler (ARLS) has an encoder-decoder structure that leverages feature embedding in a higher dimensional space. The input vector of each operation is encoded by linear projection, which is then transformed into embedding by the encoder using a self-attention mechanism. However, the encoder structure of ARLS is different from that of the original transformer (Vaswani et al., 2017).

Type: Technical Innovation

attention-based

encoder-decoder structure

feature embedding

self-attention mechanism

ARLS generalized performance is tested at varying sizes of synthetic and benchmark instances, including problem instance sizes greater than seen in training. To evaluate, we use three synthetic datasets with sizes 6 × 6, 10 × 10 and 15 × 15. Additionally, seven well-known benchmark sets, from Shylo (2010), are tested for performance.

Type: Performance Evaluation

generalized performance

synthetic datasets

benchmark instances

problem instance sizes