How did the world’s most successful IT organizations get to where they are today? In this second blog post in our DevOps series, we explore how disrupters like Amazon and Netflix embraced DevOps principles early on – and with great results.
In my previous post “DevOps: Where Are We and How Did We Get Here?” I attempted to define DevOps by examining its roots and mapping its exponential growth to digital transformation initiatives. In this post, we will look at some of the early adopters of DevOps practices and the successful outcomes achieved. From my perspective, DevOps adoption took place over three waves:
- Wave One (The Disrupters) – the technology companies where ideas and concepts took shape and evolved into implementations that came to define DevOps (reviewed in this post).
- Wave Two (The Disruptees) – the organizations that realized early that they needed to rapidly adapt and transform to survive disruption by new entries to their markets (reviewed in an upcoming post).
- Wave three (Mainstream) – where we are today, with most organizations somewhere on the path to DevOps enlightenment (this blog series was created for operational leaders and practitioners in this group of organizations).
DevOps Innovators – The Disrupters
Even before the term “DevOps” entered the lexicon in 2009, practices and roles were evolving that would come to be considered central to DevOps transformations. These initiatives were taking shape in the web-scale (large cloud service) companies of the early 2000s. This makes sense; for these companies, technology did not support its core business – technology was its core business.
These organizations needed to re-think architectures, roles, and processes to support technology at unprecedented scale and speed. Therefore, if you look at the early champions of the movement you find companies like Google, Amazon, Netflix, and Etsy. These were followed quickly by software companies like Puppet, Chef, LinkedIn, Red Hat, and VMware. Additionally, startups creating or benefiting from new tools and processes evolved rapidly out of the DevOps space.
In this post I review a couple of these “disrupters”, partly to provide context and background for terms now considered to be central to DevOps philosophies – terms like “SRE” (see Google below) and “Two Pizza Team” (see Amazon below). The transformations and outcomes for these organizations are well documented, therefore links to additional information are included in each section.
“If you can’t feed a team with two large pizzas, it’s too large” Jeff Bezos, Amazon CEO, 2004
“You build it, you run it” Werner Vogels, Amazon CTO, 2006
Between 2001 and 2009, Amazon transitioned from (building monolithic apps with) large development teams to (building microservice-based apps with) their famous “two-pizza teams”. The philosophy is that if you can’t feed the whole team with two pizzas, then the team is too large for efficient collaboration and becomes too complex to be effective.
Until the launch of AWS in 2006, Amazon was still thought of as a disrupter of brick-and-mortar bookstores. In 2006, Werner Vogels publicly defined Amazon as a technology company and coined what would become a DevOps mantra “You build it, you run it” – effectively resigning developers to carry pagers and causing operations teams everywhere to cheer loudly. Oh, and software bugs decreased, and application stability increased dramatically.
“Hope is not a strategy” Origin unknown, adopted by Google SREs
A new role began to take shape at Google around 2003 which they named the “Site Reliability Engineer”. Google SRE teams are made up of resources with both software engineering and sysadmin skills. Each SRE team is responsible for all aspects of the day-to-day operations of their service. Operational work (which they call “toil”) is capped at 50% with the remaining time to be spent using coding skills on project work. SRE is considered a role within a DevOps implementation.
Additionally, Google introduced the concept of the “error budget”, which is based on the observation that 100% uptime is not the appropriate reliability target for all services. The business owner of the service establishes an availability target (based on user needs and expectations). If the availability target is determined to be 99.99% that leaves 0.01% as the error budget – service downtime that should be used to launch new features at maximum velocity without fear of resulting outages.
This simple concept resulted in a surprisingly effective balance for service management:
- Uptime exceeding the availability target is regarded as an indication that the team is not using the downtime available appropriately to launch new features or make service enhancements.
- Downtime exceeding the error budget is regarded as an indication that the team is pushing too many new features and needs to take a step back and focus on service reliability.
“Hire smart people and get out of their way” Dave Hahn, Netflix Senior SRE, 2019
In 2008 (when Netflix was still shipping DVDs to customers) a database corruption issue had a major effect on company operations and accelerated their transition to scalable, resilient, distributed systems hosted in the cloud. Building their rapidly growing streaming service in the cloud allowed them to take advantage of the built-in scalability capabilities, freeing their engineers to focus on features for its core business.
And yet, in 2012 Netflix was still struggling with laborious deployments and team silos. Inspired by the principles of the DevOps movement, service teams at Netflix embraced “Operate what you build”, with Full Cycle Development teams responsible for the full software life cycle: design, development, test, deploy, operate, and support.
Today, Netflix is confident enough in its world-class engineering practice to run the ultimate test of service availability – their “Chaos Monkey” software randomly terminates instances in production to ensure that services are resilient.
Results and Outcomes
“DevOps solves the most important business problem of our generation, [which is] how organizations make the transition from good to great.” Gene Kim, DevOps Advocate, 2015
As early adopters expanded DevOps practices across their organizations, they achieved improved software delivery outcomes as expected. Many, however, also experienced improved organizational performance.
Can a culture of high-performing IT teams extend benefits to other parts of the business?
This outcome was identified in the 2014 State of DevOps Report, which found the use of DevOps practices to be a strong indicator of IT performance, and IT performance to be “predictive of organizational performance. As IT performance increases, profitability, market share, and productivity also increase.”
This claim attracted the interest of non-IT business leaders seeking additional information and non-anecdotal data.
The writers of the “State of DevOps Report” (from 2014 – 2019) published their research methodology, plus additional research and context, in the 2018 book “Accelerate: The Science of Lean Software and DevOps”. This book shatters the myth that moving faster means trading off against other business goals, such as stability, and provides key, objective metric recommendations to measure performance improvements in a statistically meaningful way.
The 2019 State of DevOps Report doubled down on the use of DevOps practices as an indicator of high-performing IT organizations and adds an additional indicator – the implementation of cloud computing characteristics as defined by NIST.
“Elite performers were 24 times more likely to have met all essential cloud characteristics than low performers”, 2019 State of DevOps Report.
This supports my position that DevOps and Cloud are intrinsically intertwined, which I explore in detail in the upcoming blog post “DevOps – What About Cloud?”
With over a decade passed since technology companies started to use the principles, tools, and processes that came to be known as DevOps, the outcomes are clear. Successful DevOps and Cloud Computing implementations are indicators of high-performing IT organizations, which themselves are indicators of high organizational performance (defined by profitability, market share, and productivity).
There is no time like the present to accelerate your DevOps journey, and VMware can help. From discovery workshops to help set objectives, to technology implementations for DevOps toolchains and cloud capabilities, to advisory services for organizing for success, VMware is ready to support your journey!
DevOps at VMware
VMware lives DevOps in many ways; as a practitioner of the principles for software development, as a provider of tools and solutions that support DevOps practices, and as an advisor and implementor for DevOps initiatives across many of our customer organizations;
- VMware transformed to an agile foundation over a 3-year period, embracing a DevOps culture across our engineering teams and completing our DevOps transition in 2017.
- VMware solutions, such as vRealize and Tanzu, are an important part of the DevOps Toolchain ecosystem.
- VMware provides consulting and Professional Services to customers looking for assistance at any stage of their transformation journey.
Accelerate: The Science of Lean Software and DevOps (2018) Nicole Forsgren PhD, Jez Humble, Gene Kim
DevOps Blog Series:
|DevOps #1: Where Are We and How Did We Get Here?||May 2020|
|DevOps #2: Innovators and Outcomes – The Disrupters||May 2020|
|DevOps #3: Early Adopters and Outcomes – The Disruptees||June 2020|
|DevOps #4: Culture – Collaboration, Empowerment, Autonomy||June 2020|
|DevOps #5: Devopsdays – DevOps Culture Embodied||July 2020|
|DevOps #6: Principles and Outcomes||August 2020|
|DevOps #7: Technology – The DevOps Toolchain||August 2020|
|DevOps #8: Technology – Continuous Everything||September 2020|
|DevOps #9: Technology – DevOps @VMworld||September 2020|
DevOps – Without Developers?