// the find
pluja/whishper
Transcribe any audio to text, translate and edit subtitles 100% locally with a web UI. Powered by whisper models!
Whishper is a self-hosted transcription and subtitle editing suite that runs Whisper models locally via a Go backend, Python transcription API, and SvelteKit frontend, all wired together with Docker Compose. It is for developers or power users who want offline speech-to-text with a usable web UI and don't want to send audio to a third-party service.
FasterWhisper as the backend is the right call — CPU transcription is genuinely usable rather than painfully slow. The subtitle editor has real features: CPS warnings, segment splitting, and playback-synced highlighting, which puts it ahead of most self-hosted alternatives. yt-dlp integration for URL-based transcription is a practical addition that saves a download-then-upload step. The multi-container Docker Compose setup is clean and the separation of concerns between the Go orchestrator and the Python transcription API is sensible.
The main branch is in maintenance mode — the README says no new releases until a v4 rewrite lands, which makes adopting it now a mild bet. MongoDB as the database for what is essentially a job queue and transcription store is overkill; it adds a container and operational surface for no real benefit over SQLite or Postgres. No authentication is on the roadmap but not yet implemented, so anyone who exposes this to a network gets an open transcription service. The nginx, MongoDB, LibreTranslate, Go backend, and Python API running as five separate containers means the resource floor is higher than the feature set justifies for a personal setup.