// the find

yoshitomo-matsubara/torchdistill

★ 1,620 · Python · MIT · updated Mar 2026

A coding-free framework built on PyTorch for reproducible deep learning studies. PyTorch Ecosystem. 🏆26 knowledge distillation methods presented at TPAMI, CVPR, ICLR, ECCV, NeurIPS, ICCV, AAAI, etc are implemented so far. 🎁 Trained models, training logs and configurations are available for ensuring the reproducibiliy and benchmark.

torchdistill is a YAML-config-driven framework for running knowledge distillation experiments in PyTorch without writing training loop code. It implements 26 KD methods from major ML venues (CVPR, NeurIPS, ICLR, etc.) with reproducible configs, pretrained checkpoints, and training logs. It's aimed at researchers who want to benchmark KD methods or apply them to new architectures without reimplementing standard pipelines.

The ForwardHookManager is genuinely useful — it lets you tap intermediate activations from any layer by module path string, no model surgery required. The breadth of implemented methods (26, spanning 2014–2025) with matching configs and logs makes it credible as a reproducibility tool, not just a demo. PyTorch Hub integration in the YAML config means you can pull in timm or torchvision teacher models without writing any Python. Trained models and logs are actually published alongside configs, which most ML repos skip.

The YAML config DSL is doing too much: instantiating datasets, transforms, optimizers, and loss functions through `!import_call` magic turns configs into verbose pseudo-Python that's harder to debug than actual Python. The 'coding-free' pitch breaks down the moment you need a custom loss or unusual architecture — at that point you're fighting the abstraction layer rather than just writing code. The framework is largely one maintainer's research output, and the gap between the latest cited paper (TPAMI 2025) and last push (March 2026) suggests active maintenance but unclear community adoption beyond academic citations. No distributed training documentation is visible in the tree, which limits practical use on multi-GPU setups where most real KD work happens.

View on GitHub → Homepage ↗