DevOps Primer

What is DevOPS ?

Successful businesses today, are focused on rapidly bring customer value while differentiating themselves from the competitors.  Agile methods are being widely used across the industry now, towards this effect. Developing and Delivering a solution to the customer involves two distinct phases- Development and Deploy & Release. Bridging the gap between these two phases, so that the distinction fades out is what is termed DevOps. This involves breaking the silos between these two.

Image by FreePik

Has it really helped?

Up to the 80s, new features required from 1-5 years to develop and deploy using waterfall methods. With the introduction of DevOps, companies are able introduce even multiple changes in a day. DevOps approach has not only accelerated deployments and release but also enabled faster development with quality.

A study by puppet has found that DevOps teams release software 200 times more frequently than traditional ones. A study by Gartner found that such teams have 25% lower failure rates showing better quality; while a Forrester study has shown that IT costs can be lower by 20%. Studies by IDC has shown 10% higher Customer Satisfaction.

How do I go about?

First, we need to understand the Values and Principles of DevOps to develop the mindset; then move on the Practices and Tools which help us to implement. Lastly, we also learn about Platform Engineering which is fast emerging discipline aligned to DevOps and provides the underlying infrastructure to enable many of the DevOps practices and also to scale DevOps across large enterprises.

1.0 What are DevOps Values?

While there does not exist universally defined Values like we have in Agile, XP etc., there is wider acceptance of these Values

  • Culture of Shared responsibility across the organization
  • Collaboration between teams especially between Development and Operations
  • Flow of work constantly across the value stream from Development to Operations and then to the customer
  • Feedback for continual improvement

2.0 What are the DevOps Principles ?

Principles show us the approach to derive the benefits from DevOps thinking. DevOps Principles are primarily grouped into 3 areas of Flow, Feedback and Continual Learning and Experimentation.

2.1 Principles of Flow

The Flow principles are essentially derived from Lean and emphasize

  1. Make work visible to identify and eliminate bottlenecks while helping better prioritization
  2. Manage WIP (work in progress) to improve the Lead Time that improves Flow
  3. Reduce Batch Sizes which is again related to WIP, to improve Quality outcomes as well and help improve error diagnostics and remediation
  4. Reduce hand-offs to bring down delays and also preventing Knowledge dilution & losses
  5. Eliminate bottlenecks of the system
  6. Eliminate or reducing wastes in the system

2.2 Principles of Feedback

The Feedback principles provide guidance on enabling fast and continuous feedback at each and every stage of the value stream to enable detection and correction of errors.

  1. Increase agility, resilience and ability to learn and innovate by increasing information flow between cause and effect or simply put, see problems in the system as they occur.
  2. Contain problems before they have a chance to spread
  3. Push quality and safety responsibilities closer to where it is performed or the source
  4. Optimize for downstream work-centers (e.g., Operations) to better identify design issues

2.3 Principles of Continual Learning and Experimentation

These principles aid in creating a Learning organization that constantly creates individual knowledge, turning it further into team and organizational knowledge.

  1. Support a Generative Culture which provides a safe system of work – the ability to experiment and
    make mistakes and learn
  2. Improve the daily work by improving processes as well as the products
  3. Convert Local learnings (individual & team) to organization wide knowledge
  4. Find ways for increasing productivity by injecting resilience in daily work
  5. Reinforce a culture of relentless improvement

3.0 What are some key DevOps Practices?

The DevOps practices mentioned below are by no means exhaustive but cover some of the important ones.

3.1 DevOps Practices for improving Flow

  1. On-Demand
    Environments
    In order to ensure smooth workflow and to avoid disruptions, the development team should work in environments similar to production. To achieve this DevOps enables on-demand
    creation of any environment, be it Development, QA or Production. This helps teams to quickly reproduce, analyse and fix any defects in their own environments.
  2. Consistency of environments or “single version of truth” is achieved by storing Development, QA and Production environments in version control. Further any configuration changes should be done through automated tools or if possible, eliminate the need to make changes by rebuilding the environments.
  3.  DOD Agile teams should include expand their Definition of Done to include working software in production-like environments.
  4. Test Automation – Automated Test suites need to perform integration of Code and environments every time a new change is put into version control. Automate as many tests for Unit testing, Acceptance Testing and Integration testing for detecting and fixing problems immediately.
    Where possible, Performance and Security testing and other NFRs should form part of the automated test suite. Builds should be halted whenever there is an issue.
  5. Continuous Integration – is merging into trunk everyone’s daily work. Thus, the team productivity is given a higher priority; there are no branches and developers check in their code to the trunk daily.
    The more frequently this is done, the smaller is the batch size and corrections can be faster in case of errors.
  6. Automated Deployments – an automated mechanism that deploys code into production helps deployments to be performed easily either by Operations or Development.
  7. Decouple Deployments from Releases – this empowers Development and Operations for successful deployments while Product Owners to be responsible for successful business outcomes.

The release pattern can be Environment based or Application based. In Environment based pattern, two environments are maintained – one being Live which alone receives the customer traffic and another non-live where new code is deployed. In Blue-Green pattern, the Live and non-live alternate. In Canary releases, the Release is progressively moved to the next environment until it goes live for all customers. An extension of this is the Cluster Immune system which rollbacks the release automatically if the expected performance falls below a value.

In Application- based patterns, is by using feature toggles which help to selectively enable and disable features in production. Features can be deployed into production without making them accessible to users and is termed “dark launching”

  • Architect for Low-Risk Releases – tightly coupled architectures makes deployment prone to global failures. Also deploying even small changes may requires co-ordinating across many teams as the code base grows over time. A loosely -coupled architecture with well-defined interfaces that guides how various components connect with each other helps easier deployments.

3.2 Dev-ops Practices for improving Feedback

  1. Create Telemetry to foresee problems before customers see – Events, Logs and Metrics are collected at the OS, Application and Business Logic layers and then stored and analysed. This helps to proactively generate alerts for suspect thresholds; logs can be transformed into metrics.  Thus, any information on the production system becomes accessible to anyone with privileges without logging into production servers. Further information radiators could be created for information that everyone on the value stream need to see. Further statistical analysis of the data helps to anticipate and alert on only real problems and not provide unwanted alerts.
  2. Use Telemetry to get quick feedback on deployments – information generated immediately on deployment of a feature or actively monitoring helps to know whether features are working as desired in production; helps to know if any other functionality or other service is broken due to the deployment. Action can be taken to either roll-back or fix forward the change.
  3. Shared responsibility – After the deployment during the active monitoring period, development team members share the on-call roster with Operations. Thus, everyone is involved in fixing any issues that may arise. Developers also self-manage their service before handing over to a centralised Ops group.
  4. Contextual inquiry – Developers need to follow their work downstream and understand the next steps clearly. Apart from this, they need to understand how their application is being used by the customer. This way, developers can find ways to improve flow and also realise the appropriate need
    for.
  5. A/B Testing – while A/B testing is used to experiment a new functionality with two different set of groups, the support comes from the operations through the ability to quickly and easily do deployments on demand, potentially delivering multiple versions of code simultaneously to customer.
  6. Peer review continually – relying more on a continual peer review process at every stage of development and deployment cycle and less on external approvals like change approval process, improves outcomes.

3.3 DevOps practices for Continual learning and experimentation culture

1. Create a just culture for learning – is to avoid blame culture; instead focus on the root cause of issues and learn from them. The learnings from post-mortems focussing on what is the learning should be shared across the organization.

2. Inject Production failures for resilience- is rehearsing large-scale failures by injecting faults into production environments and ensuring that failures happen in specific and controlled ways.

3. Reinforce transparency and collaboration- byintegrating automated deployment tools with persistent chat rooms. This way knowledge is captured as the events unfold and shared over chat rooms of OPS Engineers in action.

4. Express polices as code and centralize- bycreating acentral repository where all documented processes and standards can be stored and the entire organisation can use. All NFRs can also be codified.

5. Provide time for innovation and learning- To promote life-long learning, there is need to planned time for refactoring, reduce technical debt, forums for sharing knowledge besides allowing dedicated time for innovation like coding dojos etc.,


3.4  What Tools can possibly used for DevOps?

There are a vast number of tools available to enable quicker and efficient implementation of the Practices of DevOps. The following are just a sample.

Environments Docker, Kubernetes, Vagrant, Terraform, Ansible
Continuous Integration Jenkins, GitLab, Travis CI, BitBucket, TeamCity, Azure DevOps
Test Automation Selenium, Jenkins, Cucumber, Appium, SoapUI, Junit
Infrastructure Puppet, Chef, Pulumi,
Telemetry Grafana, Prometheus, InfluxDB, ElasticSearch, DynaTrace
Repository Git, Subversion, Perforce, Confluence, Wiki, MS Sharepoint

4.0 What then is Platform Engineering?

As per State of DevOps Report 2023, “Platform engineering is the discipline of designing and building self-service capabilities to minimize cognitive load for developers and to enable fast flow software delivery. Platform teams deliver shared infrastructure platforms to internal users responsible for delivering a value stream”. Platform engineering has taken the already existing platform approach and added an explicit focus on treating the platform as a product rather than as a project.

For dev teams, it promises to let them work at their own cadence and focus on the application rather than dealing with underlying infrastructure. For Ops, delivering via self-service promises to help them deal with ever-increasing scale of demands. For Security teams, a consistent platform promises a smaller surface area to secure, and for business leaders, it promises easier IT governance.

References

What do you think?

Leave a Reply

What to read next