// the find

serge-chat/serge

★ 5,724 · Svelte · Apache-2.0 · updated Nov 2025

A web interface for chatting with Alpaca through llama.cpp. Fully dockerized, with an easy to use API.

Serge is a self-hosted web UI for running GGUF models locally via llama.cpp, wrapped in a FastAPI backend and SvelteKit frontend, all bundled into a single Docker container. It targets developers who want a privacy-preserving chat interface without touching any cloud API. The last meaningful activity was late 2025, and the project has slowed considerably since the early llama.cpp wave.

Single-container deployment is genuinely easy — one docker run line and you're talking to a local model with no API keys. The FastAPI docs at /api/docs are auto-generated and actually useful for automation. Using Redis for chat history rather than bolting it onto SQLite is a reasonable call for a stateful streaming app. Development setup includes a remote Python debugger on port 5678, which is a nice touch for contributors.

The default JWT secret in the README is a hardcoded string ('uF7FGN5uzfGdFiPzR') — anyone who deploys without reading the env var table is running with a known secret. The repo has gone quiet; last push was late 2025 and it hasn't tracked llama.cpp's rapid API changes or the proliferation of GGUF quantization formats. There's essentially no test coverage — the test directory has a placeholder file and a healthcheck script. Compared to Ollama + Open WebUI, which has eaten this space, Serge offers less model management, no multimodal support, and a thinner ecosystem.

View on GitHub → Homepage ↗