Flight Engineer
Flight Engineer

Sharing Incident Management Responsibilities

Alice in the operations team handles the fourth out of hours incident in the week for the new application. Alice feels the new application has more incidents than other applications.

Alice talks to Bob in the development team. Bob explains that management requests features for the new application, and the development team does not have time to fix stability bugs.

Alice measures that the new application averages five incidents per week, while other applications average two incidents per week.

Alice proposes to her manager Carol that they start sharing incident management work with the development team. Carol discusses the idea with Dave, the development team manager. Carol negotiates a compensation plan and a rotation schedule so developers can handle incidents out of hours.

After handling some incidents, Bob identifies some issues that lead to instability. Bob negotiates with Dave and management to hold off some new features so that developers can address the instability issues.

After implementing the fixes, out of hour calls decrease, and Bob rarely needs to handle calls. Alice and the operations team's load also returns to normal.

More importantly, Bob learns about some reliability issues, and takes those issues into consideration when developing new applications. Therefore, newly developed applications are more reliable when they are released than before Bob's experience working with operations. Additionally, the organization learns that aligning the incentives for different teams is effective.

Reflect about the preceding history.

Post your opinion about what would happen if a system administrator proposed this change in an organization you know. Review other students' posts and discuss.

0 Kudos
2 Replies
Mission Specialist
Mission Specialist

This is a good example. Is it necessary for the SRE's to have engineering background or application development skills?

0 Kudos
Flight Engineer
Flight Engineer

Hi gbarath!

The "Describing the SRE Journey" section in chapter 2 of TL112 "Determining Operational Readiness" says that SRE should be comfortable with automating processes *and* system administration.

That's logical, because to reduce toil, SREs will work on automating stuff related with systems, which requires programming things related with systems.

So some training on programming is helpful, or at least the willingness to learn.

The introduction in Google's SRE book:

, details how Google hires SREs, although other companies applying SRE principles might have different requirements.

I may parse "application development skills" as something more specific, including knowing how to develop web or phone applications, etc. I think experience on those areas can help you as a SRE, because you'll probably be more familiar with coding and other adjacent skills (such as version control, testing, etc.), but it's the fundamentals of programming (variables, loops, conditionals, etc.) that will be most needed.

Hope that helps,


0 Kudos
Join the discussion
You must log in to join this conversation.