// the find

AI-Hypercomputer/maxtext

★ 2,320 · Python · Apache-2.0 · updated Jun 2026

A simple, performant and scalable Jax LLM!

MaxText is Google's reference JAX implementation for training large language models on TPUs and GPUs at scale, covering pre-training and post-training (SFT, GRPO, GSPO). It supports essentially every major open-weight model family — Llama 4, DeepSeek V3.2, Qwen 3.5, Kimi K2, Gemma 4 — and lets XLA do the heavy lifting on optimization instead of hand-tuned kernels. This is for ML researchers and teams who need high MFU at tens-of-thousands-of-chips scale, not for someone fine-tuning a 7B on a single A100.

1. XLA does the sharding and fusion work automatically — you get competitive MFU without writing custom kernels, which is rare at this scale. 2. Model coverage is genuinely current: DeepSeek V3.2, Kimi K2 1T, Qwen 3.5 397B, Gemma 4 multimodal — all landed within weeks of the upstream releases, not months. 3. The post-training stack is complete: SFT, DPO, LoRA, GRPO, GSPO, knowledge distillation, all multi-host capable, with vLLM handling inference sampling for RL. 4. Orbax checkpointing with emergency and multi-tier options is production-grade — this is the actual infrastructure Google uses, not a demo wrapper.

1. TPU-first in practice: GPU configs exist for a handful of Llama variants but the test matrix, CI, and docs are heavily TPU-weighted. You will hit rough edges on GPU. 2. GCP lock-in runs deep — GCS for data and checkpoints, xpk for cluster management, Pathways for multi-host RL, Vertex AI for TensorBoard. The 'decoupled mode' exists but it's a workaround, not a first-class path. 3. The repo went through a major restructure in early 2026 (src layout, legacy post-train shims removed) and the dust hasn't fully settled — some docs reference old paths, and the RESTRUCTURE.md migration guide is the kind of thing you read after something breaks. 4. Config sprawl is real: hundreds of YAML files across model/hardware combinations with no clear hierarchy, and base.yml has dozens of underdocumented knobs — expect time investment just to understand what you're changing.

View on GitHub → Homepage ↗