// the find

intel/ipex-llm

★ 8,822 · Python · Apache-2.0 · updated Jan 2026

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.

IPEX-LLM is Intel's library for accelerating LLM inference and finetuning on Intel hardware (Arc GPUs, NPUs, iGPUs, Xeon CPUs) with support for 70+ model architectures. It patches into the HuggingFace/llama.cpp/Ollama ecosystem to transparently apply Intel-specific quantization and kernel optimizations. The project is now officially archived by Intel.

- Genuinely useful niche: if you have Intel Arc GPUs or Core Ultra NPUs, this was the only real path to decent LLM performance on that hardware, with actual benchmarks and perplexity tables to back the claims

- FlashMoE support for running DeepSeek-R1-671B and Qwen3MoE-235B on 1-2 Arc A770/B580 cards is technically interesting - sparse MoE models fit better in limited VRAM than dense equivalents

- Integration surface is wide: the portable ZIP approach for Ollama and llama.cpp means no manual driver/library wrestling, which is the main pain point on Intel GPU setups

- Perplexity tables across INT4/FP6/FP8/FP16 for multiple model families give you actual data to decide whether the quantization tradeoff is worth it

- The repo is archived as of the README - Intel has explicitly dropped support, noted known security issues, and stopped accepting patches. This is the most important thing to know before adopting it

- pyproject.toml still says 'BigDL 2.0' with a description about Apache Spark - this was clearly a hastily rebranded project (from bigdl-llm) and the internal tooling reflects that messy history

- Intel GPU market share is tiny compared to NVIDIA; the optimization work here doesn't transfer to any other hardware, so this is a dead end for anyone who might later switch GPUs

- The dependency on Intel-specific OneAPI/IPEX stack means the environment setup is genuinely painful - the 'one command' install claim in the docs understates the driver version sensitivity and conda environment requirements

View on GitHub →