// the find

dimastatz/whisper-flow

★ 768 · Python · MIT · updated Apr 2026

Whisper-Flow is a framework designed to enable real-time transcription of audio content using OpenAI’s Whisper model. Rather than processing entire files after upload (“batch mode”), Whisper-Flow accepts a continuous stream of audio chunks and produces incremental transcripts immediately.

Whisper-Flow wraps OpenAI's Whisper model with a tumbling-window streaming layer, accepting PCM audio chunks over WebSocket and emitting incremental partial transcripts. It's aimed at developers who need real-time speech-to-text without paying for a cloud ASR API. The tiny.en.pt model is bundled in the repo itself, which tells you exactly what kind of project this is.

The tumbling-window design is straightforward and correct — splitting on natural speech pauses is the right call for Whisper, which wasn't built for streaming. The 275ms mean latency on M1 hardware is genuinely good for a CPU-only inference setup. The FastAPI + WebSocket interface is clean and easy to integrate; the library API exposes just enough surface area without overcomplicating it. The dev tooling is unusually disciplined for a project this size: pylint at 9.9/10 enforced, 95% coverage gate, benchmarks wired into the build script.

The bundled `tiny.en.pt` model is committed directly to the repository, which is a bad habit — it bloats the git history permanently and makes switching models awkward. There's no VAD (voice activity detection) before sending chunks to Whisper, so silence and background noise burn inference cycles. The benchmarks are all English LibriSpeech; accuracy on accented speech, noisy environments, or non-English audio is completely untested despite Whisper's multilingual capability being a selling point. The roadmap entry about 'py-speech package' integration has no linked issue or context, so v1.2 is effectively undefined.

View on GitHub → Homepage ↗