It's been quite a ride for organizations worldwide as they try to bounce back from the recent CrowdStrike outage. On July 19, what may have started as an ordinary day, swiftly spiraled into global mayhem. Thankfully (and surprisingly), this wasn't caused by a cyberattack, but rather by a routine internal update gone awry. So, what happened?
This blog will cover the impacts of the CrowdStrike outage and lessons learned for third-party risk managers.
A minor tweak in CrowdStrike's sensor configuration update for Windows systems contained a logic error, resulting in the notorious Blue Screen of Death (BSOD) for computers worldwide. The resulting impacts were significant; major airline carriers faced flight cancellations and delays due to disrupted systems. Almost every major health system using Microsoft products experienced an outage. Hospitals faced technology issues, leading to paused procedures and patient disruptions. Banks and financial services providers also encountered operational disruptions, affecting transactions and customer services. Many other industries were also impacted, including services, wholesale, freight, and broadcast media.
The sheer scale of this outage is almost too hard to imagine. Thousands of organizations use both Microsoft and CrowdStrike. Beyond that, many organizations have customers and third parties that use Microsoft and CrowdStrike. It's still too soon to truly understand the impacts of this global event, including the financial losses and reputational damage resulting from it.
The big question is, what can we all learn from this? For those familiar with third-party risk management (TPRM), the situation perfectly illustrated the importance of business continuity planning and disaster recovery (BC/DR) management. But first, we need to talk about the dangers of concentration risk.
Concentration risk occurs when an organization heavily relies on a single vendor or software provider, putting all its proverbial eggs in the same basket. While the CrowdStrike outage drove that point home for organizations, it also reminded us that industry consolidation could have far-reaching implications, especially when a few big players hold all the cards. The fact that a single security update caused such widespread disruption highlights the risks of global interconnectivity, particularly for organizations central to public safety, economic stability, and national security.
Pro Tip: For vendors who provide products that automatically update or can have their configurations adjusted by the vendor, ensure you understand how all software changes are pushed out, including the types, cadence, and configuration options available. This is not only relevant for the recent event, but for the SolarWinds incident in 2020.
BC/DR planning are essential components of risk management. They can serve as valuable tools for combatting concentration risk. Business continuity focuses on ensuring vital business functions can continue during and after a disaster. At the same time, disaster recovery is the process of restoring and recovering IT infrastructure and operations following a disaster or other disrupting scenario. Of course, BC/DR is necessary for an organization itself. Still, it also must be a requirement for critical third parties or those on which there is a significant operational, transactional, compliance, or financial dependency.
Having well-developed plans is only part of the requirement. Those plans should be tested regularly and your vendors should share the results. Without testing, it's impossible to determine how effective the plan is.
TPRM, in collaboration with internal BC/DR teams, should consider if joint testing between the organization and third party is necessary. This may involve:
Third-party BC/DR plans and testing results should be reviewed at least annually. They should always be reviewed by qualified subject matter experts (SMEs) to identify any gaps or weakness. Critical vendors should also have processes in place to collect and review the BC/DR plans of their critical vendors as part of an effective TPRM program.
Suppose a vendor has an ineffective or a material issue in their BC/DR plans or they aren’t effectively managing and monitoring the BC/DR risks in their own vendor inventory. In that case, your organization must implement additional (most likely internal) controls to bridge the gap. That might include diversifying the product or service across multiple vendors, increasing your organization's insurance, securing professional risk intelligence and monitoring, implementing a secondary vendor as a warm backup, finding another vendor, or combining solutions to minimize the potential impacts.
Remember that complex problems often require creative solutions, so loop in your internal BC/DR team and relevant SMEs, such as your operations, cybersecurity, legal, compliance, or even finance teams. A variety of expertise and perspectives can help your organization enhance its overall BC/DR approach, establish BC/DR standards for third parties and their subcontractors, and hopefully reduce the impacts of a business interrupting event.
At the end of the day, despite the chaos caused by the outage, there might be a silver lining. Most organizations will bounce back, albeit with some time, but should remember the lessons learned.
If nothing else, this global outage highlighted that concentration risk is no joke. It's a wake-up call for organizations to take proactive steps to identify concentration risk in their supply chain and ensure well-developed and tested BC/DR plans to limit the impacts of the next big disruption.