Rogers system outage: seven steps to better crisis resilience
By Natalia Smalyuk
8:00 a.m. Friday morning. A colleague sends a meeting cancellation from the nearest Starbucks. That’s how I learned about the Rogers outage.
Like many Canadians, I wondered what happened, and why it happened in the first place.
For a while, there was not much in terms of answers.
To me, Rogers’ response made for a telling “how-not-to” crisis management case study other organizations can learn from.
Here are some steps they should consider as they reflect on ways to boost their resilience.
Build a capable crisis management team.
Preparedness starts with a well-oiled crisis management team representing all critical functions without creating decision gridlocks. The core team should be small enough to move quickly and sufficiently versatile to engage stakeholders across disciplines to avoid tunnel vision. For example, understanding the interconnection between information technology and operational technology can be a critical step in preventing or containing the catastrophic ripple effects of failure in one of these systems.
Think about crises systematically.
Dr. Ian Mitroff says every serious crisis is a “wicked mess,” or a system of highly interactive problems. In a “wicked mess,” seemingly unrelated or improbable events collide to create an even bigger mess. Thinking about the Rogers outage, we should not view it as just a technology failure. It was a major communication failure for thousands of Canadians who could not connect with their organizations, families and emergency workers. According to the Globe and Mail, many were unable to call 911. Hospitals, public transit and countless other public and private services were disrupted.
Assess the company’s crisis risks.
Complex systems call for a comprehensive analysis of all risks. Many of them are “predictable surprises” that still catch organizations by surprise. Risks can be classified based on their likelihood of occurring and the level of threat they pose to a company and its stakeholders. In a statement released on Saturday, Rogers’ chief executive Tony Staffieri said the system failure led Rogers’ routers to malfunction. However, remembering another Canada-wide service interruption in April 2021, customers may wonder: Are these really isolated events?
Learn from mistakes.
A common mistake is … not learning from mistakes. Has the company done its homework understanding what can go wrong and how to get it right? If there’s no comprehensive investigation after each crisis, or if the results of this investigation are not shared among key decision-makers, the causes of an incident will not be understood. “Predictable surprises” will continue to blindside the organization. Lessons learned after each incident are the basis for updated crisis plans – pre-established frameworks that are indispensable in guiding fast and effective decisions in an emergency.
Conduct a robust assumptional analysis.
To uncover vulnerabilities, crisis teams should surface and challenge critical assumptions that underlie their plans. Assigning “red teams” to poke holes in the information and operational technology architecture is one way to understand what can go off the rails, how this may affect the business, and what could be the implications for stakeholders, including customers who rely on the company’s phone to call emergency services. The core principle of security is understanding the dependencies between different elements of a “wicked mess.”
Stress-test your crisis capabilities.
All these steps only provide a false sense of safety if crisis plans are not tested ahead of time. When Apollo 13 astronauts were heading home in a damaged capsule, mission control put their colleague on the ground in a flight simulator to work out how to bring the crew back to Earth. Crisis managers should take a page from these spacemen. Before making their moves in a high-stakes event, they should practice them in a safe environment of full scenario simulations or their lighter version, desk-top exercises.
Work through your crisis communication strategy.
Business leaders are trained to talk only when they have hard evidence. But investigating the scale, impacts, and root causes of service failures may take days, if not months. Waiting to confirm the facts before communicating is a common mistake. Stakeholders want answers now. They favour organizations that take responsibility, communicate early, and demonstrate empathy. So far, Rogers is not getting good marks for accountability, transparency or compassion.
The best defense is offense. Organizations that practice proactive crisis management act, not react before, during and after emergencies. Rogers should think long and hard about what can go wrong, how to get it right, and how to rebuild trust with its stakeholders.