// the find

ai4co/rl4co

★ 880 · Python · MIT · updated May 2026

A PyTorch library for all things Reinforcement Learning (RL) for Combinatorial Optimization (CO)

RL4CO is a benchmark library for neural combinatorial optimization — training learned heuristics (attention models, pointer networks, POMO, etc.) to solve problems like TSP, CVRP, and job-shop scheduling using RL. It targets OR researchers who want to run and compare NCO algorithms without rewriting the training infrastructure from scratch. Built on TorchRL + PyTorch Lightning + Hydra.

The environment abstraction is genuinely well-designed: swappable embeddings mean you can plug a new problem domain in without rewriting the policy, which is the right decomposition for this kind of research. Coverage is wide — 20+ environments across routing, scheduling, and EDA, plus both autoregressive and non-autoregressive policy families. The Hydra config layer makes hyperparameter sweeps and experiment reproducibility much less painful than a typical research codebase. It's backed by a KDD 2025 paper with 30 named authors, which means the benchmark results are likely to hold up under scrutiny.

880 stars for a library this comprehensive suggests it hasn't broken out of the academic NCO niche — if you're not already in that world, the learning curve to understand what POMO or SymNCO is doing is steep and the docs don't bridge that gap. GPU memory requirements are undocumented; training on large instances will surprise you. The dependency stack (TorchRL + Lightning + Hydra + TensorDict) is heavy and version-sensitive — 'pip install rl4co' hiding four major frameworks is a recipe for environment conflicts. Real-world deployment path is missing entirely: the library trains models but has no story for serving a trained policy or integrating with an existing OR workflow.

View on GitHub → Homepage ↗