finds.dev← search

// the find

lemony-ai/cascadeflow

★ 2,510 · Python · MIT · updated May 2026

Cascading runtime for AI agents. Optimize cost, latency, quality, and policy decisions inside the agent loop.

CascadeFlow is a Python/TypeScript library that routes LLM calls through a cheap 'drafter' model first, validates quality, and only escalates to an expensive model when the cheap one falls short. It runs in-process rather than as an HTTP proxy, which means it can gate individual tool calls and read agent state — not just request boundaries. For anyone running agentic loops at scale and paying flagship model prices for tasks that don't need them, this is a real problem worth solving.

The in-process vs proxy distinction is architecturally correct and matters — 40-60ms of proxy RTT per step compounds badly in a 10-step agent loop, and proxies can't see per-step budget or tool-call context. The three-tier integration API (observe → scoped run → decorated agent) lets you instrument first and commit later, which is the right adoption path for something that touches production LLM calls. Framework coverage is genuinely broad: LangChain, OpenAI Agents, CrewAI, PydanticAI, Google ADK, n8n, and Vercel AI all have working integrations with example code. Budget enforcement with hard stop/deny_tool actions gives you a lever that most AI cost tools don't — you can prevent a runaway agent from blowing past a $0.50 cap mid-loop.

The cost savings numbers (69-93%) are benchmarked on datasets like GSM8K and TruthfulQA where small models already do well — those numbers will not hold on real production queries with messy context and tool results. Quality validation relies heavily on logprob confidence scoring, which isn't available from Anthropic or many other providers and is a noisy signal even when it is. The 'self-improving, gets smarter the more it runs' claim has no fine-tuning or persistent model updates behind it — it's routing heuristics that update in-session, which is a much weaker thing. Maintaining Python and TypeScript implementations at API parity in a rapidly-moving project is hard; the TypeScript package is already structured separately and will drift.

View on GitHub → Homepage ↗

// want more like this?

We dig through GitHub every week and send a few repos picked for what you actually care about — each with an honest take like this one.

Get finds in your inbox → Search again →