Your first time on this page? Allow me to give some explanations.
Awesome Site Reliability Engineering
A curated list of Site Reliability and Production Engineering resources.
Here you can see meta information about this topic like the time we last updated this page, the original creator of the awesome list and a link to the original GitHub repository.
Thank you dastergon & contributors
View Topic on GitHub:
dastergon/awesome-sre
Search for resources by name or description.
Simply type in what you are looking for and the results will be filtered on the fly.
Further filter the resources on this page by type (repository/other resource), number of stars on GitHub and time of last commit in months.
Culture
DevOps Enterprise Summit, London, June 25-26, 2018
🎯 A set of Site Reliability Engineering notes & challenges
Education
Books
Hiring
Reliability
A curated list of Chaos Engineering resources.
Monitoring & Observability & Alerting
On-Call
Tips and tricks for getting through on-call
Run Book / Operations Manual template for modern software systems
A lifecycle model for describing incident management
Post-Mortem
A collection of postmortems. Sorry for the delay in merging PRs!
Compilation of public failure/horror stories related to Kubernetes
A collection of postmortem templates
Capacity Planning
Service Level Agreement
Performance
Programming
Misc Articles
📙 Amazon Web Services — a practical guide
Real-time Messaging
Blogs
Newsletters
Conferences & Meetups
SRE Tools
A curated list of Site Reliability and Production Engineering Tools
List of Continuous Integration services
A vocabulary collection for SREs