// the find
apache/hertzbeat
An AI-powered next-generation open source real-time observability system.
Apache HertzBeat is an agentless monitoring platform for infrastructure, databases, middleware, and cloud-native systems — think Zabbix or Prometheus with a built-in UI, but where you define monitors via YAML templates instead of installing agents. It recently added an AI layer (LLM chat, MCP server, SOP-based automated diagnosis) on top of its metrics/alerting core. Aimed at ops teams who want a self-hosted, all-in-one observability stack without the Prometheus+Grafana+Alertmanager assembly tax.
The YAML-template monitoring model is genuinely good — you can add a new monitor type (say, a custom HTTP API) without writing Java, just drop a config file. The coverage is broad: 100+ built-in templates spanning databases, OS, middleware, network switches, and cloud-native. Alert pipeline is well-structured: real-time + periodic threshold rules, grouping/convergence, silence/inhibit — all the Alertmanager primitives are there but integrated rather than bolted on. The MCP server and SOP (Standard Operating Procedure) engine for AI-driven diagnosis is architecturally interesting — it lets you define multi-step LLM workflows in YAML tied to real monitoring tools, which is more useful than just chat.
The 'AI-powered' label is doing heavy lifting — the AI features are an add-on module, not deeply integrated; you're essentially wiring an LLM to existing REST APIs, which any team could do themselves. Long-term metrics storage depends on pluggable TSDBs (VictoriaMetrics, IoTDB, TDengine) but the default embedded H2/memory store is unsuitable for production, making the 'quick start with one Docker command' misleading for anyone beyond a demo. The codebase is large and primarily Chinese-community-driven, which means English documentation lags, issue responses can be slow for Western users, and some UI strings still flip to Chinese. The native collector (non-JVM) is still maturing — JDBC-based monitors silently fall back to the JVM collector in ways that aren't obvious until you're debugging why your Oracle monitor stopped working.