CHOW #198 – How do you setup your Cloud Reliability Team?

Santa is the scrum master for the DevOps team. She just took over as a scrum master and found the team highly functioning. They are working on many initiatives going as well as keeping up with the plethora of service requests that are coming their way.

Santhosh is the Product lead for DevOps and he found that there are many requests that were coming in the field of cloud setup and cloud automation. The type of skills and timing required were quite different from the DevOps team. He has also found that his peers are forking a separate team for Cloud reliability. This is particularly critical when the organization is trying to scale their cloud efforts

If you were in Santa’s role, how will you go about setting up the Cloud Reliability Team?

  1. Look at the type of skills that you would look for
  2. Initial set of objectives that the team could undertake
  3. How would you organize their work intake.
  4. Any thoughts on initial backlog


Santa partnered with the Product Owner Santhosh and designed a 4 day 3 hour sprint 0 activities to kick start the Cloud Reliability Team.

Some of the key things that she did are

A. Enable understanding of estimation

B. Compelling Product Visioning exercise

C. Personas that are targeted by the Cloud Reliability Team

D. User Journey map on how services will be provided

They finalised that they will accept requests through two important ways

A. Through Service Requests for creating brand new environments for products

B. Through creation of automation services for users to go the self service way

What Next

Once they identified their core competencies or focus areas; The initial backlog became a simpler model.

  • Creating authorisation and authentication policies so that anyone within the team that’s servicing the cloud requests follow the standard and secure ways of setting up
  • Automation to enable self service to enable the product teams to easily setup their cloud services and components. This was based on their experiments that the team had conducted in the sandbox
  • Automation to setup environments so that any service requests are easier and error free
  • Access to sandbox environments for developers and testers to test their skills


Thus with the above initial backlog, they found that the CRE didn’t have a few skills. They realised that Entire team consisted of Cloud developers and architects. Some of them had experience in setting up scalable cloud environments. What was missing was python scripting and test automation skills within themselves. .

Like any team that’s getting setup initially, Cloud reliability team also need to start with the purpose. Why do we exist and who are we servicing and that gets everyone energised towards the same goal.

What do you think?

Leave a Reply

What to read next