// the find

upgundecha/howtheysre

★ 9,730 · JavaScript · CC0-1.0 · updated Nov 2025

A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)

A link aggregator of engineering blog posts, conference talks, and incident reports organized by company, covering SRE topics like on-call, chaos engineering, postmortems, and observability. It's a reading list, not a tool — no code, no framework, just URLs. Best suited for engineers ramping up on SRE or looking for how specific companies approach reliability problems.

Coverage is genuinely wide — 100+ companies from Airbnb to Zalando, with real incident reports and postmortems, not just feel-good blog posts. The GitHub availability reports section alone is an unusually honest look at how a major platform actually fails. Company-organized structure makes it easy to find how a specific org you're targeting does things. The CI pipeline runs a link checker, so dead links get caught.

No search, no tagging by topic depth — finding 'how company X handles SLO alerting specifically' means reading through everything. Content freshness varies wildly; some sections haven't seen new links in years while others are current. The JavaScript classification is misleading — there's no real code here, just a test suite that validates markdown links. Curation is shallow: a link to a blog post about Kubernetes at Company X gets the same weight as a detailed postmortem — no signal about which links are actually worth your time.

View on GitHub →