Crisis Engineering: A Structured Approach to Navigating Emergencies
At Five9nes, we specialize in handling urgent, high-impact situations where clarity, decisiveness, and speed are essential. Our crisis engineering methodology is designed to cut through complexity and chaos, enabling organizations to make quick, informed decisions during times of crisis. While traditional methods rely on theory and preparation, crisis engineering is about action and adaptation—it's a form of operational sensemaking, akin to the OODA loop (Observe, Orient, Decide, Act), where continuous feedback and iteration drive success. Here’s how we structure our approach:
Communicate. Make a Stakeholder Matrix.
The first step in managing a crisis is to gather an accurate, real-time understanding of how things work—in practice, not just in theory or on paper. In any large, complex system, there are undocumented shortcuts, unofficial processes, and deviations from the prescribed workflows that often matter the most in a high-pressure situation.
To create this understanding, we talk to as many people as possible across the organization: engineers, operators, administrators, and decision-makers. The goal is to map out the critical flow of work, identifying not just what is supposed to happen, but what actually happens at every step. Consensus and coverage are more valuable than perfect accuracy at this stage. The goal is to capture the shape of the problem rather than obsessing over details, which will reveal themselves in time. And whenever possible, we focus on gathering real data from live systems. Experience has shown us that numbers in static reports are often unreliable during a crisis.
Gather Key Stakholders in one room! Sharing a Consensus Reality
Once the operational map is in place, we gather the key experts and decision-makers. These are the people who understand the system at both the technical and administrative levels and have the authority to make changes. In a crisis, it is critical to have a small group of people who can make decisions on the spot without having to escalate every issue up the chain of command.
The goal is to build a shared understanding—or "consensus reality"—of the situation. This group needs to be looking at the same data, the same map, and the same metrics, and they must have the ability to act immediately on the insights gained. Ideally, this happens in person, but videoconferencing or chat channels can also be effective. What’s important is that this team can make decisions quickly and decisively.
The Operational Timeline
In crisis situations, time is your most valuable—and limited—resource. It’s critical that everyone on the crisis engineering team is aligned on two key factors: the deadline for mitigation or recovery and what success looks like. Often, the timeline is driven by external factors, such as regulatory requirements, contractual obligations, or customer expectations, but it must be communicated clearly and repeatedly across the team.
As the situation evolves, deadlines and definitions of success may shift. It’s vital that changes are broadcast quickly and clearly to avoid misalignment. In crisis engineering, coordination and clarity are just as important as technical solutions.
Try, try and try again…
Crisis engineering is not about perfection; it’s about action. Once you have an understanding of the system and a team of experts aligned around the situation, it’s time to experiment. The first action often involves subtractive changes—turning off non-essential systems, eliminating workflows, or restricting access. These actions give you more control and visibility over what remains active in the system.
As soon as you make a change, observe the impact. Did it work? Did it create new insights? Did it make things worse, or did new problems emerge? At this point, the focus is on updating your map with new discoveries and continuing the cycle of action and feedback. Crisis engineering is an iterative process, and every step you take should provide more clarity about the true nature of the problem.
99 Problems, a new problem ain’t one
While speed and decisiveness are critical, the golden rule of crisis engineering is this: do not make the problem worse. Specifically, avoid introducing new unknowns into an already chaotic situation. If turning off a system or pausing a workflow creates a known, manageable consequence—like clearing a backlog of requests—that’s fine. But always ensure that you understand the downstream impacts of your actions before proceeding.
The key here is to balance urgency with caution. Taking bold steps is necessary, but only when you have a reasonable level of confidence in the consequences.
Speed and Iteration: The Heart of Crisis Engineering
The above steps form a feedback loop that can be executed rapidly, often within a day or two. This continuous iteration allows your team to adapt to new information and adjust strategies in real time. As the map becomes more accurate and the team’s understanding of the problem deepens, the likelihood of finding an effective solution increases dramatically.
At Five9nes, we believe this process can be taught and scaled to any organization. We offer workshops designed to build the skills necessary for fast, effective crisis response, ensuring that your teams are prepared to navigate complex challenges and deliver results when it matters most.