Your first time on this page? Allow me to give some explanations.
Awesome Site Reliability Engineering
A curated list of Site Reliability and Production Engineering resources.
Here you can see meta information about this topic like the time we last updated this page, the original creator of the awesome list and a link to the original GitHub repository.
Thank you dastergon & contributors
View Topic on GitHub:
Search for resources by name or description.
Simply type in what you are looking for and the results will be filtered on the fly.
Further filter the resources on this page by type (repository/other resource), number of stars on GitHub and time of last commit in months.
DevOps Enterprise Summit, London, June 25-26, 2018
🎯 A set of Site Reliability Engineering notes & challenges
[Sample chapter titled CPUs]
A curated list of Chaos Engineering resources.
Monitoring & Observability & Alerting
Tips and tricks for getting through on-call
Run Book / Operations Manual template for modern software systems
A lifecycle model for describing incident management
A collection of postmortems. Sorry for the delay in merging PRs!
Compilation of public failure/horror stories related to Kubernetes
A collection of postmortem templates
Service Level Agreement
📙 Amazon Web Services — a practical guide
Conferences & Meetups
A curated list of Site Reliability and Production Engineering Tools
List of Continuous Integration services
A vocabulary collection for SREs