// the find

rasbt/reasoning-from-scratch

★ 4,566 · Jupyter Notebook · Apache-2.0 · updated Jun 2026

Implement a reasoning LLM in PyTorch from scratch, step by step

Sebastian Raschka's companion repo to his Manning book on building reasoning LLMs from scratch. It walks through GRPO-based RL training, inference-time scaling, self-refinement, and distillation on top of Qwen3 — the same family of techniques behind DeepSeek R1. Aimed at ML practitioners who want to understand reasoning models mechanically rather than just call an API.

The progression from inference-time tricks (self-consistency, best-of-N) to RL training (GRPO) to distillation is well-sequenced — each chapter builds on the last without hand-waving. The bonus scripts are genuinely useful: batched GRPO, FSDP support, DeepSeek-V3.2- and OLMo3-style variants give you real starting points for experiments beyond the book exercises. CI runs on Linux, macOS, and Windows with both current and older PyTorch versions, which matters for a tutorial repo where environment compatibility is the first thing that breaks. Pre-trained checkpoints are downloadable for chapters 6–8, so you can skip the expensive training run and go straight to experimentation.

The repo is locked to the book's chapter structure, and the README says so explicitly — no external contributions that extend main chapter code. If the book's approach has a flaw or something in the field moves fast (which it will), the code stays frozen. Hardware requirements are described vaguely: 'GPU recommended for chapters 5–6' doesn't tell you whether a 16 GB consumer card is enough or whether you need 40 GB for the GRPO runs. All chapter code lives in Jupyter notebooks, which makes diffing, testing, and automated linting harder than plain Python modules — the test suite wraps notebooks rather than importing functions directly, which is a fragile pattern.

View on GitHub → Homepage ↗