// the find

deepseek-ai/DeepSeek-Math

★ 3,351 · Python · MIT · updated Apr 2024

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

DeepSeekMath is a 7B parameter language model specialized for mathematical reasoning, trained on 120B tokens of math-heavy web data scraped from Common Crawl. It ships three checkpoints — base, instruction-tuned, and RL-trained — and hits 51.7% on the MATH benchmark without tool use, which was competitive with GPT-4 at release time. This is a research artifact, not a library; you're here to run or fine-tune the model, not import a package.

The GRPO (Group Relative Policy Optimization) training approach is the genuinely interesting part — it's what pushed the RL variant to ~60% on MATH with tools, and the paper explains it well enough to replicate. The data pipeline is methodical: four iterations of FastText-guided Common Crawl mining reaching 35.5M pages, with the pipeline documented and reproducible. The evaluation harness is self-contained with datasets bundled and eval scripts that cover nine benchmarks across English and Chinese, which makes honest comparison straightforward. Model weights are MIT-licensed for code and commercially usable under their model agreement, so there's no legal ambiguity for most use cases.

The repo has been effectively frozen since April 2024 — no updates, no issue responses, no improvements since publication. The 4096 token context limit is tight for multi-step competition problems that require long derivations. The evaluation scripts have no packaging or dependency pinning beyond an environment.yml, so getting the harness running requires some archaeology. There's also no fine-tuning code or training recipe included; you get weights and eval, but reproducing or extending the RL training requires hunting through the paper and implementing GRPO yourself.

View on GitHub →