// the find

hiyouga/ChatGLM-Efficient-Tuning

★ 3,721 · Python · Apache-2.0 · updated Oct 2023

Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调

A fine-tuning toolkit for ChatGLM-6B supporting LoRA, QLoRA, P-Tuning v2, freeze tuning, and full RLHF (SFT → reward model → PPO) in one script. Targets Chinese-speaking developers who want to adapt ChatGLM to domain-specific tasks without the overhead of full fine-tuning. The author explicitly abandoned it in favor of LLaMA-Factory, which now covers ChatGLM2 and a broader model zoo.

The full RLHF pipeline in a single repo — SFT, reward model training, and PPO — is genuinely useful and rare in Chinese LLM tooling. Hardware requirements table is honest and specific: QLoRA at r=8 fits in 8GB VRAM, which opens this up to consumer hardware. Dynamic batch padding instead of padding to max sequence length is a real throughput win. The web UI wrapping all training stages in Gradio lowers the barrier for non-CLI users meaningfully.

Repo is explicitly unmaintained since October 2023 — the author redirected to LLaMA-Factory, so bug fixes and dependency updates stopped. BLEU/ROUGE on 100 samples is not a meaningful evaluation of instruction-following quality; the +1-2 point improvements shown say almost nothing about real-world usefulness. Multi-GPU training only works through accelerate with no DeepSpeed ZeRO stage 3 support mentioned, so scaling past a couple of A100s will hit walls. The PPO implementation leans heavily on TRL internals that have since changed, meaning the RLHF path likely breaks against current TRL versions without patching.

View on GitHub →