Machine Learning Insides OptVerse AI Solver: Design Principles and Applications

Xijun Li, Fangzhou Zhu, Hui-Ling Zhen, Weilin Luo, Meng Lu, Yimin Huang, Zhenen Fan, Zirui Zhou, Yufei Kuang, Zhihai Wang, Zijie Geng, Yang Li, Haoyang Liu, Zhiwu An, Muming Yang, Jianshu Li, Jie Wang, Junchi Yan, Defen Sun, Tao Zhong, Yong Zhang

Abstract

In an era of digital ubiquity, efficient resource management and decision-making are paramount across numerous industries. To this end, we present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud’s OptVerse AI Solver, which aims to mitigate the scarcity of real-world mathematical programming instances, and to surpass the capabilities of traditional optimization techniques. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. Furthermore, we introduce a training framework leveraging augmentation policies to maintain solvers’ utility in dynamic environments. Besides the data generation and augmentation, our proposed approaches

Core problem

The core problem is the scarcity of real-world mathematical programming instances and the inefficiency of traditional optimization techniques in handling complex SAT and MILP problems. This limitation hinders the development and testing of advanced optimization algorithms.

Key findings and Contribution

HardSATGEN generated SAT instances with preserved computational hardness using a one-to-one bipartite graph representation and a multi-stage generation pipeline, facilitating accurate replication of SAT instance hardness.
G2MILP is the first deep generative framework for MILP instances, leveraging weighted bipartite graphs and GNNs, producing realistic and computationally challenging MILP instances.
Adversarial training, formulated as a contextual bandit problem using the PPO algorithm, improves solver performance on out-of-distribution instances and enhances learning efficacy without explicit mathematical formulations.
GCN was used for initial basis selection in LP problems, improving convergence rates and significant speed enhancements observed across different datasets.
RL4Presolve is the first learning-based approach to accelerate presolve in large-scale LPs using reinforcement learning, consistently improving efficiency of solving LPs.
Hierarchical sequence model (HEM) for cut selection in MILP solvers via reinforcement learning outperformed all baselines in terms of time and PPD integral.
NeuralDiving, which uses GCNN to predict values of binary variables in MILP problems, significantly reduces the time to reach good solutions.
HEBO (Heteroscedastic and Evolutionary Bayesian Optimization) for tuning solver parameters significantly outperforms existing black-box optimizers.
Transformer BO, a meta BO framework using transformer-based neural process for solver tuning, improves sample efficiency on new target tasks.

Limitations

HardSATGEN faces difficulty in semantic formation of structures due to oversplit substructures.
G2MILP's complex graph generation task requires simplifying assumptions.
Initial basis selection using GCN requires LP problems to be similar in nature for effective generalization.
Challenging to deploy RL4Presolve directly due to hardware constraints for high-end GPUs.
HEM's training efficiency is low due to sparse supervised signals.
HEBO requires significant computational resources for hyperparameter optimization.
Transformer BO is dependent on accumulated data from source tasks for effective knowledge transfer.

Key quotes

HardSATGEN specifically tackles the preservation of computational hardness in SAT instances, using a novel one-to-one bipartite graph representation and a split-merge framework enhanced with fine-grained control over community structures and unsatisfiable cores. This approach facilitates more accurate replication of SAT instance hardness by addressing the inherent heterogeneity in the split-merge procedure and the difficulty of semantic formation of structures led by oversplit substructures.

Type: Data Generation (SAT Instances)

HardSATGEN

bipartite graph representation

split-merge framework

unsatisfiable cores

In the OptVerse AI solver, we propose a novel hierarchical sequence model (HEM) to learn cut selection policies via reinforcement learning, which is the first learning-based method that can tackle which cuts to prefer, how many cuts to select, and the order of selected cuts simultaneously. Specifically, the HEM consists of a higher-level policy that predicts the number of cuts to select and a lower-level policy that determines the specific cuts, optimizing the mix of quantity, preference, and sequential ordering of cuts.

Type: Policy Learning (Cut Selection in MILP Solvers)

hierarchical sequence model (HEM)

cut selection policies

reinforcement learning

sequential ordering