SRE Transformation: A Strategic Perspective on Building Reliability
At Five9nes, we’ve built our reputation on solving urgent, high-stakes problems where the margin for error is razor-thin. Our origins lie in incident response, and our mission has always been to address the kinds of crises that demand immediate, results-oriented action. While the consulting landscape is crowded with firms offering polished presentations and surface-level solutions, our approach has always been hands-on, practical, and deeply technical. We don’t compete in PowerPoint decks—we specialize in navigating complex, entrenched issues with outcomes that are make-or-break for the organizations we serve.
Last year, we were brought into a project that took us slightly outside of our usual domain. A multinational financial services company had embarked on an ambitious multi-year plan to embed Site Reliability Engineering (SRE) principles across its global technology operations. Despite substantial investment and effort, progress had stalled, and we were asked to figure out why. With tens of thousands of employees attempting to "do SRE," it became clear to us that the fundamental question—what is SRE?—had not been fully answered in this context.
A Challenging Landscape: Defining SRE for Large Enterprises
In many tech companies, particularly those born digital, SRE is well understood and integrated into the culture. It operates as a guild, much like Unix did in its early days—a natural extension of a deeply technical environment. There’s no existential debate about what SRE is, just as nobody questions the role of Presbyterianism in Wisconsin. It’s simply part of the landscape.
However, in large, heritage organizations—especially those straddling the line between traditional industries and modern tech—the implementation of SRE is far from straightforward. In the case of this financial services company, which relied heavily on complex, legacy infrastructure and had experienced decades of growth through mergers and acquisitions, the meaning of "SRE" had become nebulous. Some teams saw it as a job title, others as a toolset, while some even associated it with a significant AWS bill. There was no unifying definition, and this lack of clarity was stalling progress.
At its core, the value of SRE lies in its ability to drive operational efficiency by managing exponentially growing business demands with only a logarithmic increase in cost. This is where SRE’s appeal truly lies for non-technical business leaders. It's not about adopting the latest buzzwords or technologies, but rather about achieving scalable growth while containing operational overhead—a value proposition that resonates with any business focused on profitability and resilience.
Pitfalls in Implementation: Standardization and Gatekeeping
As we immersed ourselves in the organization's efforts, a few recurring themes emerged that were hindering their SRE transformation. Chief among them was an over-reliance on standardization as a lever for change. While it’s tempting to streamline technology stacks across a large, diverse enterprise, the reality is that such initiatives often fail in environments as complex as this one. In a decentralized organization where each business unit is mature and focused on self-sustenance, attempts to impose standardization can easily devolve into futile churn.
Even more detrimental is the gatekeeping that often follows when standardization efforts lose momentum. When top-down directives start restricting the ability to purchase SaaS tools or provision on-premises capacity, the effect is usually to alienate the very teams that are critical to driving operational improvements. We observed this pattern repeatedly—well-intentioned attempts to impose control ended up stifling innovation and increasing internal friction.
Similarly, focusing on SLOs (Service Level Objectives) and incident response practices too early in the transformation journey proved ineffective. While these are critical components of a mature SRE practice, they should not be the starting point. Without first establishing the right cultural foundations and alignment on the true purpose of SRE, these efforts were more likely to result in frustration and resistance than meaningful change.
A Path Forward: Evolving a Culture of Reliability
So, how do you establish SRE in a large, complex organization where momentum has stalled? The answer lies not in rigid methodologies or standardizing tools, but in cultivating a culture that values reliability and efficiency as core principles. Based on our experience, SRE Done Right can be distilled into two fundamental beliefs: highly reliable systems are inherently good, and rote human tasks are inherently bad. These core values must be nurtured within a subset of the organization to create a micro-culture of excellence, which can then grow organically over time.
We advised the client to start by identifying those individuals who were already passionate about improving system stability and reducing manual toil. These "true believers" were then given end-to-end ownership of a critical business process, clear accountability metrics (e.g., 90% uptime on a key batch process), and the autonomy to make changes across the technology stack. Critically, their budget and resource constraints were made explicit from the outset, so that all decisions could be made within the right economic context.
To foster this emerging culture, we recommended reinforcing their identity within the organization. This could be as simple as giving them a unique job title, a separate office space, or even a symbolic gesture like a team logo or sticker. The point was to create a sense of belonging and autonomy, without getting bogged down in bureaucratic gatekeeping.
Once this initial culture took root, the results started to speak for themselves. Success stories began to emerge, and other teams naturally gravitated toward the SRE principles. As we had anticipated, the second wave of adoption came not from mandate but from internal demand, as employees saw the tangible benefits of the new approach.
Conclusion: Achieving Exponential Growth with Logarithmic Costs
At the heart of SRE is a promise that resonates with every business: the ability to grow operations exponentially while controlling costs. For large, complex organizations, the key to realizing this promise lies in fostering a culture that values reliability, efficiency, and the elimination of manual toil. It’s not about adopting the right tools or following a prescribed methodology. Rather, it’s about creating the conditions for evolutionary change, driven by those who are already passionate about improving the systems they manage.
By focusing on empowering a small, motivated group within the organization and giving them the authority, accountability, and resources to make meaningful changes, you can plant the seeds of a successful SRE transformation. From there, the culture will spread organically, unlocking the potential for scalable, cost-efficient growth across the entire enterprise.