// the find
hiyouga/LlamaFactory
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
LLaMA Factory is a one-stop fine-tuning toolkit for 100+ LLMs and VLMs, covering SFT, DPO, PPO, KTO, ORPO, and reward modeling with LoRA/QLoRA/full-tuning support. It targets ML practitioners and researchers who want to fine-tune open-weight models without writing training loops from scratch. Backed by an ACL 2024 paper and actively maintained with near day-0 support for new model releases.
- Model coverage is genuinely impressive and kept current—Qwen3, DeepSeek-R1, Llama 4, Gemma 3, InternVL3 all supported within days of release, not weeks later as an afterthought.
- YAML-driven config system with a large examples/ directory means you can copy an existing config close to your use case and tweak it rather than hunting through argument docs. The hardware requirement table is unusually honest and practical.
- QLoRA memory estimates are well-documented and the FSDP+QLoRA path for multi-GPU setups (70B on 2x24GB) is tested and has example configs, not just mentioned in a bullet point.
- Modular requirements split (requirements/gptq.txt, bitsandbytes.txt, vllm.txt, etc.) means you only install what you actually need instead of pulling in every optional dependency upfront.
- The docs are explicitly marked WIP on ReadTheDocs and the README is doing most of the heavy lifting. For anything beyond the happy path—custom data formats, debugging multimodal pipelines, understanding what actually happens inside PPO—you're reading source code.
- Supporting 100+ models means the model-specific code paths in template.py and mm_plugin.py are sprawling. Subtle bugs in less-used model+training-method combinations (e.g., a specific VLM with DPO) can go unnoticed, and the test suite likely doesn't cover the full matrix.
- PPO implementation delegates heavily to TRL, so you're subject to TRL's API churn. The version pinning in pyproject.toml is loose enough that a TRL minor bump has historically broken things, and the fix is 'pull latest and reinstall'.
- The Gradio web UI (LLaMA Board) is convenient for demos but doesn't expose the full parameter surface, so anyone serious about training will end up on CLI anyway—it's not clear who the target audience for the UI actually is in production.