// the find
mnfst/manifest
Connect Your Agents And Harnesses With Any Provider 🦚
Manifest is a self-hosted LLM router that sits between your application and AI providers, dispatching each request to the cheapest or most appropriate model based on complexity scoring, custom HTTP headers, and routing rules. The headline differentiator is reusing paid consumer subscriptions — ChatGPT Plus, Claude Max, GitHub Copilot — instead of paying API rates, which is a real cost lever if you already carry those subscriptions.
The subscription provider support is the genuinely interesting part: OAuth-based routing through GitHub Copilot or Claude Max means you can absorb a lot of agent traffic against a flat monthly fee instead of metered API calls. The test suite is real — nearly every service and controller has a matching spec file, and migrations have test coverage too, which is unusual for a project this young. The routing model gives you explicit escape hatches via custom HTTP headers, so you're not forced to trust the auto-router's complexity judgment for calls where you know what you want. Cost tracking is per-message and per-agent with hard spending limits, which is the feature that actually justifies the operational overhead of running a router.
Routing programmatic traffic through personal AI subscriptions (ChatGPT Plus, Kimi Coding Plan, MiniMax Coding Plan) almost certainly violates those services' ToS — this isn't a fringe use case, it's the core value proposition, so adopters should understand what they're signing up for. The 'complexity' and 'specificity' scoring that drives automatic routing is never explained in the README or docs; a miscalibrated router that silently downgrades your calls to cheaper models will degrade output quality in ways that are hard to catch. The project is still in beta and the changeset files name active problems — SSE throttle, zero-downtime deploy drain — meaning you'd be inheriting operational rough edges that are publicly acknowledged but not yet closed. Every LLM call now has a self-hosted network hop in the critical path, and there's no latency characterization anywhere; for interactive applications this is a real concern.