// the find
ccfos/nightingale
Nightingale is to monitoring and alerting what Grafana is to visualization.
Nightingale is a Go-based alerting engine that sits in front of your existing time-series storage — Prometheus, VictoriaMetrics, ClickHouse, Elasticsearch, etc. — and handles rule evaluation, event routing, and notification delivery. It originated at DiDi, was open-sourced, and is now under CCF. Target audience is ops teams at Chinese internet companies who find PagerDuty/Grafana Alerting too limited but don't want to build their own pipeline.
The data source breadth is genuinely good: Prometheus, VictoriaMetrics, Loki, ClickHouse, MySQL, Postgres, Elasticsearch, TDengine — most alerting tools don't cover this list. The edge deployment mode (n9e-edge) for disconnected data centers is a real architectural win that Grafana Alerting doesn't have. The event pipeline with relabeling, metadata enrichment, and self-healing callbacks is more capable than what you get out of Grafana. The recently added AI agent layer (MCP server, skill system, LLM-backed alert rule creation) is surprisingly well-structured — it's not bolted on.
Documentation is mostly in Chinese first, English second — the English docs lag and some concepts are never translated. The business group / permission model is opinionated around Chinese enterprise org structures and will feel foreign if you're not coming from that world. There's no native on-call scheduling or escalation policy — they explicitly tell you to use PagerDuty/FlashDuty for that, which means Nightingale is always a partial solution in a Western ops stack. The community contribution quality on bundled dashboards and alert rule templates varies wildly; you'll import garbage and have to clean it up.