// the find
linkedin/Burrow
Kafka Consumer Lag Checking
Burrow is a Kafka consumer lag monitor from LinkedIn that evaluates consumer group health without requiring you to set lag thresholds — instead it uses a sliding window algorithm to determine if a consumer is actually falling behind or just slow. It exposes an HTTP API for on-demand status queries and supports email/HTTP/Slack alerting. Aimed at platform teams running multi-cluster Kafka who are tired of tuning threshold alerts that page at 3am for normal bursts.
The threshold-free evaluation model is the real value here — a consumer that's lagging but catching up is treated differently from one that's falling further behind, which eliminates a whole class of false-positive alerts. Multi-cluster support is first-class, not bolted on. The coordinator pattern throughout the codebase (cluster, consumer, evaluator, notifier each have their own coordinator) keeps concerns well separated and makes the test coverage meaningful. Prometheus endpoint is built in, so plugging into existing monitoring stacks is straightforward.
The README still references Go 1.11/1.12 and Zookeeper-based offset storage — both are ancient history for any Kafka setup post-2020; the docs haven't kept up with the codebase. All state is in-memory, so a Burrow restart loses the sliding window history and you get a blind period before it can make confident evaluations again. The notifier system only supports email, HTTP, and Slack — no native PagerDuty or OpsGenie integration, so you're writing a webhook shim. Configuration is TOML only with no environment variable overrides, which is awkward in containerized deployments.