// the find

caidaoli/ccLoad

★ 320 · Go · MIT · updated Jun 2026

AI API gateway that ends manual channel switching with smart routing, auto failover, exponential cooldown, multi-URL scheduling, live request monitoring and soft-error detection.

ccLoad is an AI API gateway written in Go that puts multiple Claude/OpenAI/Gemini/Codex API keys behind a single endpoint, handling channel selection, failover, exponential cooldown, and cross-protocol translation. It ships as a single binary with embedded SQLite and a web dashboard. The target user is someone managing a pool of API keys — shared team gateways, rate-limit arbitrage across providers, or Claude Code pointing at a proxy instead of Anthropic directly.

- Soft-error detection is genuinely well-thought-out: it catches HTTP 200 responses that are actually errors (JSON `{error: ...}`, SSE `rate_limit_exceeded` events, plain-text load warnings in Chinese) and routes them through the same failover path as real 4xx/5xx. AI APIs are notorious for this and most proxies miss it.

- The protocol transform system (Anthropic ↔ OpenAI ↔ Gemini ↔ Codex) is extensive and covers edge cases like preserving `thinking` parameters and `thinkingLevel` mapping across protocol families. 18 built-in transforms with dedicated stream and non-stream paths is real engineering, not a quick shim.

- Local token counting at <5ms is a practical win: callers can estimate cost before committing a request without burning API quota. The 93% accuracy claim is plausible for the models they target, and the implementation handles tool definitions which is where naive counters break.

- The codebase is well-decomposed for a single-binary tool — proxy split into handler/forward/error/util/stream/debug modules, a proper cooldown package, a unified storage SQL layer that abstracts SQLite vs MySQL differences, and a test file for nearly every source file in the tree.

- All critical state (cooldown timers, RPM counters, concurrency slots) is in-memory and per-process. Running two instances for availability means they share no failure knowledge — a channel getting hammered in instance A keeps receiving traffic from instance B. The README acknowledges this but doesn't offer a solution beyond 'use MySQL mode'.

- The confidence factor in health scoring gives new channels a discount on failure penalties (`confidence = sample_count / min_confident_sample`). This means a freshly-added channel that immediately starts returning 500s will take `health_min_confident_sample` requests before the penalty kicks in fully — exactly backwards from what you want during incident response.

- The hybrid SQLite-replica mode (SQLite for reads, async sync to MySQL) is clever for HuggingFace Spaces latency, but async log sync means a crash window where logs exist in SQLite but haven't reached MySQL. The README treats this as acceptable; whether it is depends on whether you're using logs for billing.

- Token counting accuracy degrades for models not explicitly priced in `cost_calculator.go` — any new model falls back to a default rate or returns zero cost, silently under-billing. Given that Claude 4.x and GPT-5.x are already in the pricing table, this is an ongoing maintenance burden that will bite whenever Anthropic or OpenAI ships something new.

View on GitHub → Homepage ↗