// the find

litmuschaos/litmus

★ 5,432 · Go · Apache-2.0 · updated Jun 2026

Litmus helps SREs and developers practice chaos engineering in a Cloud-native way. Chaos experiments are published at the ChaosHub (https://hub.litmuschaos.io). Community notes is at https://hackmd.io/a4Zu_sH4TZGeih-xCimi3Q

LitmusChaos is a CNCF-graduated chaos engineering platform for Kubernetes. It uses custom resources (ChaosExperiment, ChaosEngine, ChaosResult) to define, schedule, and observe fault injection against cluster workloads. Aimed at SREs and developers who want to run structured chaos tests rather than ad-hoc kubectl deletions.

1. The CRD-based model is the right abstraction — experiments are declarative, version-controlled, and composable into workflows, which means chaos tests live alongside your infra config rather than in someone's runbook. 2. ChaosHub (hub.litmuschaos.io) gives you a library of community-contributed experiments out of the box, so you're not writing pod-kill logic from scratch. 3. Prometheus metrics via the chaos-exporter and the probe system for steady-state validation are genuinely useful — you can assert 'p99 latency stays under 200ms during this kill' and get a pass/fail in CI. 4. Solid RBAC story: the platform separates control plane (chaos-center) from execution plane (agents), so you can give teams experiment-running permissions without cluster-admin.

1. The stack is heavy for what you get — MongoDB for auth, separate GraphQL server, separate auth service, separate event tracker, Argo Workflows dependency. Getting this running locally for the first time is a multi-hour exercise before you inject a single fault. 2. The control plane and execution plane both assume Kubernetes, so you can't use this for chaos against non-k8s infrastructure (VMs, bare metal, managed services) without the 'Litmus-Chaos' experiment shim, which is sparsely documented. 3. The experiment library on ChaosHub is mostly pod/node/network faults — if you want to test something like 'corrupt this Kafka message' or 'delay this database call by 200ms', you're writing it yourself with BYOC, which drops you back to scripting. 4. MongoDB as the persistence layer is an odd choice for a CNCF project in 2024; it creates an operational dependency that many platform teams would rather not run, and there's no official path to Postgres or SQLite.

View on GitHub → Homepage ↗