// the find
headroomlabs-ai/headroom
Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.
Headroom sits between your AI agent and the LLM provider, compressing tool outputs, logs, RAG chunks, and conversation history before they hit the context window. It ships as a Python library, an OpenAI-compatible proxy, a CLI wrapper for coding agents, and an MCP server. The target audience is developers running Claude Code, Cursor, or similar tools daily who are burning tokens on repetitive, noisy context.
The architecture is genuinely layered: a Rust core (headroom-core) handles the heavy lifting — AST-aware code compression, JSON schema deduplication, log template extraction — while the Python layer handles integration glue and the proxy handles interception without code changes. The CCR (Compressed Context Retrieval) design is smart: instead of lossy summarization, originals are stored locally and the LLM gets a retrieval tool to fetch them on demand, which is the right tradeoff between compression ratio and correctness. CacheAligner addresses a real, under-discussed problem: prefix cache invalidation from non-deterministic ordering or variable headers; stabilizing prefixes before they hit the provider is legitimate cost reduction. The benchmark methodology is unusually honest — they distinguish 'estimated' from 'measured' output savings, include confidence intervals, and offer a holdout group for real measurement rather than claiming counterfactuals as facts.
The REALIGNMENT directory is a red flag: thirteen phase documents covering bug lockdowns, live-zone rewrites, Rust proxy migration, Python retirement, and test infra overhauls suggest the codebase is mid-refactor and not in a stable state for production adoption. The Kompress-base model is a custom HuggingFace model with no public training data disclosure, so you're trusting it not to drop signal in ways the benchmarks don't catch — and the benchmark suite is the repo's own, not independent. The GitHub Copilot subscription routing feature has an explicit caveat in the README that Windows Credential Manager, Linux Secret Service, Docker, and CI paths are 'implemented or planned' but 'still need real OS validation'; shipping docs for features that aren't vetted on the platforms most CI users run is a liability. Install complexity is real: the full `[all]` extra pulls in PyTorch, ONNX Runtime, a Rust build step, and HuggingFace model downloads — the SSL inspection workaround section alone is three paragraphs, which tells you something about the surface area.