// the find

chaosblade-io/chaosblade

★ 6,351 · Python · Apache-2.0 · updated Jun 2026

An easy to use and powerful chaos engineering experiment toolkit.（阿里巴巴开源的一款简单易用、功能强大的混沌实验注入工具）

ChaosBlade is Alibaba's chaos engineering toolkit for injecting faults into Linux hosts, JVM applications, Docker containers, and Kubernetes clusters. It uses a consistent CLI model across all targets, so the same mental model applies whether you're killing a pod or corrupting a Java method's return value. The new Blade AI layer (v0.1.0, in a feature branch) wraps the CLI with an LLM agent for conversational fault injection and automated recovery verification.

The unified experiment model is genuinely well-designed — `blade create cpu fullload`, `blade create k8s pod-network loss`, and `blade create jvm delay` all follow the same verb/target/action/flags pattern, which means operators don't need separate mental models per runtime. The JVM executor uses Java Agent attach with zero application changes required, and it cleans up properly on revoke. Kubernetes support via CRD is solid — you can manage experiments with kubectl like any other resource, which fits naturally into GitOps workflows. The Blade AI agent (LangGraph-based, multi-node graph) has real architecture: separate nodes for intent clarification, safety checks, conflict detection, baseline capture, and post-mortem generation rather than one monolithic prompt.

The repo is fragmented across eight separate GitHub projects (chaosblade-exec-os, chaosblade-exec-jvm, chaosblade-operator, etc.), which means a bug touching the OS and Kubernetes layers requires PRs in multiple repos with no monorepo tooling to coordinate them — dependency hell in practice. Documentation is heavily Chinese-first; the English docs are incomplete and the gitbook links are often stale. Blade AI is v0.1.0 on a feature branch, not main — the README prominently advertises it but it's not production-ready and the safety guarantees around automatic fault injection are thin (the safety_score module exists but its scoring criteria aren't auditable). The C++ support via GDB is fragile in production environments where binaries are stripped or PIE-hardened, and there's no documentation on those failure modes.

View on GitHub → Homepage ↗