// the find
SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
faster-whisper reimplements OpenAI's Whisper using CTranslate2, a C++ inference engine that handles quantization and batching more efficiently than PyTorch. It's for developers who need Whisper transcription in production and can't afford the memory overhead or latency of the reference implementation. The benchmark numbers are real — 4-7x speedup on GPU with batching, half the VRAM at int8.
CTranslate2 backend means you get int8 quantization that actually works without accuracy collapse — 59s vs 2m23s for large-v2 on GPU is the kind of gain that changes whether something is deployable. Silero VAD integration is built-in and saves you from transcribing silence at scale. Word-level timestamps work without the extra wav2vec2 alignment step that WhisperX requires. No FFmpeg system dependency — PyAV bundles the libraries, which removes a whole class of deployment headaches.
GPU setup is genuinely annoying — cuBLAS and cuDNN 9 must be installed separately, and the CUDA version matrix (CUDA 11 needs ctranslate2==3.24.0, CUDA 12 + cuDNN 8 needs 4.4.0) is fragile and documented only in a README note. The generator API for segments is a footgun — transcription doesn't start until you iterate, and forgetting this gives you an empty result with no error. Last push was November 2025 and the project shows signs of slowing maintainer bandwidth. CPU performance is actually worse than whisper.cpp on the small model without batching, so it's not the right choice for edge or CPU-only deployments.