In 2016, Google’s Site Reliability Engineering book ignited an industry discussion on what it means to run production services today—and why reliability considerations are fundamental to service ...

Buy Now From Amazon

In 2016, Google’s Site Reliability Engineering book ignited an industry discussion on what it means to run production services today—and why reliability considerations are fundamental to service design. Now, Google engineers who worked on that bestseller introduce The Site Reliability Workbook, a hands-on companion that uses concrete examples to show you how to put SRE principles and practices to work in your environment.

This new workbook not only combines practical examples from Google’s experiences, but also provides case studies from Google’s Cloud Platform customers who underwent this journey. Evernote, The Home Depot, The New York Times, and other companies outline hard-won experiences of what worked for them and what didn’t.

Dive into this workbook and learn how to flesh out your own SRE practice, no matter what size your company is.

You’ll learn:

  • How to run reliable services in environments you don’t completely control—like cloud
  • Practical applications of how to create, monitor, and run your services via Service Level Objectives
  • How to convert existing ops teams to SRE—including how to dig out of operational overload
  • Methods for starting SRE from either greenfield or brownfield


Similar Products

Site Reliability Engineering: How Google Runs Production SystemsSeeking SRE: Conversations About Running Production Systems at ScalePractical Monitoring: Effective Strategies for the Real WorldThe DevOps Handbook: How to Create World-Class Agility, Reliability, and Security in Technology OrganizationsInfrastructure as CodePrometheus: Up & Running: Infrastructure and Application Performance MonitoringDevOps and Site Reliability Engineering (SRE) Handbook: Non-Programmer's GuideDesigning Distributed Systems: Patterns and Paradigms for Scalable, Reliable Services