// the find

chopratejas/headroom

★ 21,562 · Python · Apache-2.0 · updated Jun 2026

Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

Headroom is a token compression middleware for LLM applications that sits between your app and the API, reducing context size by 60-95% using multiple strategies: JSON minification, AST-aware code compression, a custom HuggingFace model for prose, and cache prefix stabilization. It ships as a Python library, a local proxy, and an MCP server, targeting developers running AI coding agents who are burning money on repetitive context.

- The proxy mode is genuinely useful — zero code changes, just point your existing OpenAI-compatible client at localhost:8787 and get compression for free. This is a real deployment win over library-only approaches.

- The core is being rewritten in Rust (headroom-core, headroom-proxy crates exist with substantial code), which is the right call for a hot path that touches every request. The parity testing crate suggests they're being careful about behavioral equivalence between implementations.

- CCR (reversible compression) is a legitimately clever design — compress aggressively but let the LLM retrieve originals on demand via a tool call, rather than forcing a lossy tradeoff up front.

- The benchmark methodology is at least self-consistent and reproducible via `python -m headroom.evals suite` — the numbers are auditable rather than just marketing claims, and the REALIGNMENT directory shows honest internal tracking of what's broken.

- The REALIGNMENT directory is a red flag in plain sight — 12 phase documents titled things like '01-bug-list.md' and '10-phase-H-python-retirement.md' confirm the Python layer is considered legacy and the current codebase has known correctness issues. Adopting this today means riding a migration.

- The Kompress-v2-base model adds a local ML inference dependency that will surprise users who just wanted lightweight proxy compression — the `[ml]` extra installs a HuggingFace model, which is a heavyweight surprise for what looks like a simple tool.

- The benchmark table for accuracy (GSM8K, TruthfulQA) uses N=100 which is far too small to draw conclusions, and the delta columns mix 'no change' with 'we didn't run baseline' — the SQuAD and BFCL rows lack a baseline number entirely, making the accuracy preservation claims hard to verify.

- The Copilot subscription mode ships with an explicit disclaimer that Windows, Linux, and Docker auth paths are 'implemented or planned' but 'still need real OS validation' — that's a publicly documented known-broken feature in the README, which sets an uncomfortable precedent for how finished the rest of the feature set actually is.

View on GitHub → Homepage ↗