// the find
modelscope/ms-swift
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, Phi4, ...) (AAAI 2025).
ms-swift is ModelScope's fine-tuning framework for LLMs and multimodal models, covering the full pipeline from training (SFT, GRPO, DPO, PPO) through quantization and deployment. It's aimed at researchers and practitioners who want one tool to handle pre-training through production serving without stitching together half a dozen separate libraries. The model coverage is genuinely broad — 600+ text models, 400+ multimodal.
The Megatron integration is the real differentiator: TP/PP/SP/CP/EP parallelism strategies all wired into the same CLI, including MoE-specific optimizations and now Ray-based GRPO — that's not trivial engineering. The GRPO algorithm family (DAPO, GSPO, SAPO, CISPO, RLOO, Reinforce++) is more complete than any comparable open-source trainer. Multimodal packing for 100%+ training speed improvement is a concrete, measurable claim with a clear mechanism. The Web-UI is a genuine escape hatch for teams that don't want to memorize 200 CLI flags.
The ModelScope ecosystem bias is real — default model/dataset downloads go through ModelScope, not HuggingFace, which means anyone outside China will hit slower downloads or need to remember `--use_hf true` every time. The dependency matrix is a minefield: torch, transformers, peft, trl, vllm, sglang, lmdeploy all have tight version bounds that will break on each other regularly, and the install docs just list ranges without a lockfile. Python API is pseudocode in the README — the actual classes (`get_model_processor`, `TransformersEngine`) are not stable and underdocumented compared to the CLI. Testing infrastructure appears to be CI scripts only; there's no evidence of a proper test suite, which matters when you're about to train on expensive GPU hours.