// the find
hiyouga/ChatGLM-Efficient-Tuning
Fine-tuning ChatGLM-6B with PEFT | 基于 PEFT 的高效 ChatGLM 微调
A fine-tuning toolkit for ChatGLM-6B supporting LoRA, QLoRA, P-Tuning v2, freeze tuning, and full RLHF (SFT → reward model → PPO) in one script. Targets Chinese-speaking developers who want to adapt ChatGLM to domain-specific tasks without the overhead of full fine-tuning. The author explicitly abandoned it in favor of LLaMA-Factory, which now covers ChatGLM2 and a broader model zoo.
The full RLHF pipeline in a single repo — SFT, reward model training, and PPO — is genuinely useful and rare in Chinese LLM tooling. Hardware requirements table is honest and specific: QLoRA at r=8 fits in 8GB VRAM, which opens this up to consumer hardware. Dynamic batch padding instead of padding to max sequence length is a real throughput win. The web UI wrapping all training stages in Gradio lowers the barrier for non-CLI users meaningfully.
Repo is explicitly unmaintained since October 2023 — the author redirected to LLaMA-Factory, so bug fixes and dependency updates stopped. BLEU/ROUGE on 100 samples is not a meaningful evaluation of instruction-following quality; the +1-2 point improvements shown say almost nothing about real-world usefulness. Multi-GPU training only works through accelerate with no DeepSpeed ZeRO stage 3 support mentioned, so scaling past a couple of A100s will hit walls. The PPO implementation leans heavily on TRL internals that have since changed, meaning the RLHF path likely breaks against current TRL versions without patching.