// the find
abus-aikorea/voice-pro
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with Whisper audio processing, YouTube download, Demucs vocal isolation, and multilingual translation.
Voice-Pro is a Gradio-based local web app that bundles speech recognition (Whisper/WhisperX), zero-shot voice cloning (F5-TTS, CosyVoice), vocal isolation (Demucs), translation, and TTS into one install. It targets content creators who want an ElevenLabs-style workflow without per-minute billing. The project is now fully open-sourced after the team pivoted to a different product.
The model lineup is genuinely good — F5-TTS and CosyVoice are among the better zero-shot cloners available, and supporting WhisperX gets you word-level timestamps that plain Whisper doesn't give you. Bundling Demucs for vocal isolation means the full dubbing pipeline (download → separate → transcribe → translate → reclone) works without stitching separate tools together. The bat/sh installer abstracts the CUDA setup nightmare for non-technical users. Going fully open source removes the previous free-tier 60-second cap, which was the main adoption friction.
Windows-first design shows everywhere — the configure.bat/start.bat flow, the README caveats about Mac/Linux being unverified, the CUDA assumption baked into requirements files. The repo vendors entire model libraries (Demucs, CosyVoice, RVC) as subtrees rather than pip dependencies, so you get a bloated checkout and miss upstream bug fixes silently. The project is functionally abandoned — the team explicitly says they've moved on to WeConnect and can't maintain it. CosyVoice2-0.5B is a 9GB first-run download with no progress feedback, which will silently fail on slow connections and leave users staring at a hung terminal.