// the find
oobabooga/textgen
Open-source desktop app for local LLMs. Text, vision, tool-calling, OpenAI/Anthropic-compatible API. 100% private.
text-generation-webui is a local LLM frontend that supports multiple inference backends (llama.cpp, ExLlamaV3, Transformers, TensorRT-LLM) with a Gradio web UI and an OpenAI/Anthropic-compatible API. It's aimed at developers and enthusiasts who want to run models locally with zero telemetry. At 47k stars it's the de facto standard for self-hosted LLM experimentation.
The multi-backend architecture is genuinely useful — you can switch between llama.cpp for GGUF quantized models and ExLlamaV3 for EXL3 without restarting or reconfiguring. The OpenAI/Anthropic API compatibility layer means you can point existing code at localhost and it just works, which is a real time-saver for testing prompts locally before hitting paid APIs. Portable builds with all dependencies bundled (CUDA, ROCm, Vulkan, CPU variants) lower the barrier considerably for non-technical users. The extension system is a single .py file per tool, which keeps the surface area small and makes community contributions tractable.
Gradio is the wrong foundation for a production-quality UI — it limits layout flexibility and has persistent reactivity quirks that the codebase works around rather than through. The training tab (LoRA fine-tuning) is a bolt-on that gets less maintenance than the inference path and is not something you'd rely on for serious training runs. Multi-user mode is explicitly described as suited for 'small trusted teams,' meaning there's no real auth story beyond basic Gradio password auth — not appropriate for anything Internet-facing. The extension ecosystem is large but uneven in quality, with no sandboxing, so a bad community extension can crash or compromise the whole server.