// the find
NousResearch/atropos
Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse environments
Atropos is a microservice framework for running LLM reinforcement learning environments asynchronously. Each environment runs as its own server, feeding trajectories to a central API that a trainer polls for batches. It's for researchers who want to run GRPO/PPO-style RL on language models without building the data collection and distribution infrastructure themselves.
The microservice architecture is the right call here — environments are isolated, you can run multiple simultaneously against the same API, and a crashed environment doesn't take down training. The community environment directory is genuinely useful: 40+ contributed environments spanning chess, Lean proofs, Ethereum, robotics, and meteorology, which means you can actually find something close to your use case rather than writing from scratch. The process/evaluate subcommands for local debugging without a full training loop are well thought out — HTML rollout visualization is a small thing that saves hours. Teacher distillation via ScoredDataGroup is a clean transport-layer addition that doesn't force teacher logic into BaseEnv.
The trainer is not included — Atropos handles environment rollouts and trajectory queuing but you still need to wire up your own training loop via Axolotl, Tinker, or the thin example trainer. This is a real gap if you're not already deep in the ecosystem. The tokenizer compatibility requirement for teacher distillation (same vocabulary, hard error otherwise) will block most cross-model distillation experiments without a workaround that isn't provided. Community environments have wildly inconsistent quality and documentation; several have no tests and minimal READMEs. The same-tokenizer restriction and the lack of token remap support is called out in the docs as a known limitation with no timeline.