// the find

mbailey/voicemode

★ 1,237 · Python · MIT · updated Jun 2026

Natural voice conversations with Claude Code

An MCP server that gives Claude Code a voice — speak to it, hear it respond. Uses Whisper for speech-to-text and Kokoro for TTS locally, with OpenAI as a cloud fallback. Aimed at developers who want hands-free interaction during mundane tasks.

The local-first design is the real draw — Whisper and Kokoro give you a fully offline voice loop with no cloud dependency, which matters if you're handling sensitive codebases. The test suite is unusually thorough for this kind of tool: provider failover, silence detection, VAD aggressiveness, STT error handling all have dedicated test files. The MCP integration is clean — it slots into Claude Code's existing permissions model without requiring a sidecar process or special runtime. NixOS support via a proper flake is a nice touch that most Python audio projects skip.

The platform audio dependency list is a minefield — portaudio, pulseaudio, alsa-lib-devel, ffmpeg — and the WSL2 path requires its own pulseaudio setup that the docs admit is painful. The repo has significant scope creep: there's a full 'DJ' feature for playing music during coding sessions, a 'conch' IPC mechanism, soundfonts, and a voice cloning pipeline — none of which are obviously related to voice-coding. The `.archive` directory is left in the tree and is nearly as large as the active codebase, which makes understanding what's actually current harder than it should be. Despite the 'other MCP capable agents' tagline, the plugin system, commands, and hooks are all wired specifically to Claude Code's internals, so portability is more aspiration than reality.

View on GitHub → Homepage ↗