// the find

remsky/Kokoro-FastAPI

★ 5,064 · Python · Apache-2.0 · updated Jun 2026

Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model w/multiplatform CPU, AMD, NVIDIA GPU PyTorch support, handling, and auto-stitching

A FastAPI wrapper around the Kokoro-82M TTS model that exposes an OpenAI-compatible speech endpoint. Drop it in wherever you're already calling OpenAI's TTS API and it just works. Aimed at developers running local LLM stacks who want free, offline speech synthesis.

The OpenAI API compatibility is the real selling point — if you're using OpenAI's Python client, the only change is the base_url. Voice mixing via weighted combinations (e.g., 'af_bella(2)+af_heart(1)') is a genuinely useful feature not in the OpenAI API. The benchmarks are unusually honest: they include WER roundtrip tests against real book-length text, not just cherry-picked short sentences. Multiplatform Docker images covering CPU, NVIDIA, AMD, and ARM64 cover basically every realistic deployment target.

52% test coverage is low for something people are running in production stacks, and the README admits it openly. AMD ROCm support is marked experimental and amd64-only, which matters if you're on an AMD GPU server. The WAV header issue — streaming sentinel values that break Python's stdlib wave module — is a real gotcha that will confuse anyone doing downstream processing without reading the troubleshooting section. The model itself caps at ~30 seconds of natural output per chunk; the stitching hides this but introduces intonation artifacts at smaller chunk sizes, which the README mentions but doesn't quantify.

View on GitHub →