// the find
chopratejas/headroom
Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.
Headroom is a token compression middleware for LLM applications that sits between your app and the API, reducing context size by 60-95% using multiple strategies: JSON minification, AST-aware code compression, a custom HuggingFace model for prose, and cache prefix stabilization. It ships as a Python library, a local proxy, and an MCP server, targeting developers running AI coding agents who are burning money on repetitive context.
- The proxy mode is genuinely useful — zero code changes, just point your existing OpenAI-compatible client at localhost:8787 and get compression for free. This is a real deployment win over library-only approaches.
- The core is being rewritten in Rust (headroom-core, headroom-proxy crates exist with substantial code), which is the right call for a hot path that touches every request. The parity testing crate suggests they're being careful about behavioral equivalence between implementations.
- CCR (reversible compression) is a legitimately clever design — compress aggressively but let the LLM retrieve originals on demand via a tool call, rather than forcing a lossy tradeoff up front.
- The benchmark methodology is at least self-consistent and reproducible via `python -m headroom.evals suite` — the numbers are auditable rather than just marketing claims, and the REALIGNMENT directory shows honest internal tracking of what's broken.
- The REALIGNMENT directory is a red flag in plain sight — 12 phase documents titled things like '01-bug-list.md' and '10-phase-H-python-retirement.md' confirm the Python layer is considered legacy and the current codebase has known correctness issues. Adopting this today means riding a migration.
- The Kompress-v2-base model adds a local ML inference dependency that will surprise users who just wanted lightweight proxy compression — the `[ml]` extra installs a HuggingFace model, which is a heavyweight surprise for what looks like a simple tool.
- The benchmark table for accuracy (GSM8K, TruthfulQA) uses N=100 which is far too small to draw conclusions, and the delta columns mix 'no change' with 'we didn't run baseline' — the SQuAD and BFCL rows lack a baseline number entirely, making the accuracy preservation claims hard to verify.
- The Copilot subscription mode ships with an explicit disclaimer that Windows, Linux, and Docker auth paths are 'implemented or planned' but 'still need real OS validation' — that's a publicly documented known-broken feature in the README, which sets an uncomfortable precedent for how finished the rest of the feature set actually is.