Can We Break it? 9 Business Continuity Plan Testing Scenarios

By |2023-07-20T19:12:54+00:00May 2nd, 2023|0 Comments

Nobody wants to get those dreaded 3 a.m. phone calls. “The servers are down.” “The backups failed.” “Ed from logistics opened a phishing email again!” These calls are an IT professional’s worst nightmare. But the good news is: by exploring the right business continuity plan testing scenarios, you may never have to get such a call.

Creating a business continuity plan (BCP) is only the first step toward implementing a rock-solid continuity strategy. The systems and protocols outlined in your plan might sound good in theory, but how do they hold up in a real-world disaster?

  • Can your backup systems survive a real data meltdown?
  • Will you be able to meet your RTO for restoring data?
  • How well will employees follow emergency procedures?
  • Will your emergency communication strategy work out as planned, or will it implode?
  • What will really happen when things go bad?

There’s no way to know for sure without testing. This is a critical component of continuity planning. Without putting your BCP to the test, you’ll never know if your company is truly prepared for a disaster—until it’s too late.

Today, we look at 9 business continuity plan testing scenarios that can ensure your technologies and teams are ready for anything.

Get your hammer ready (metaphorically speaking).  Once your plan is finalized, it’s time to try to break it.

Don’t worry—you’re not actually shredding the document that you spent so many weeks writing, editing and getting approved from higher-ups. And you’re not actually breaking anything at all.

However, you do need to prove the soundness of everything you put in the plan. By that, we mean using strategic tests that will help you to:

  • Identify weaknesses in your BC systems
  • Confirm that infrastructure investments meet your continuity objectives
  • Evaluate the company’s response to different types of disruptive events
  • Make improvements to systems and procedures based on test findings
  • Update your BCP accordingly

Don’t make the mistake of creating a comprehensive plan but never putting it to the test. That’s more than just laziness. It’s dangerous.   Without testing your plan, you’re putting both the business and its people at risk.

Keep this in mind: only 6 percent of companies without a disaster recovery plan survive a disaster, according to Datto.  Having an inadequate plan is just as risky as having no plan at all.

Business continuity plan testing scenarios

As you prepare for your tests, you’ll also need to determine just how “real” you want the test to be.

Testing is often a challenge for companies. The tests require time and resources for planning and executing them. For that reason, you may find it easier to conduct certain tests sitting around a conference table, rather than involving the entire organization in a full-scale drill. In business continuity, these varying types of tests are typically defined as follows:

  • Plan review: the most basic test, in which the recovery teams go over the BCP, line by line, to make sure everything is accurate and shipshape.
  • Tabletop test: a more involved version of the plan review, in which employees participate in actual exercises (usually in a conference-room setting) to confirm that everyone knows their responsibilities in various types of emergencies. These tests may also be used for testing technology components so that multiple people can evaluate how the systems behave and how it affects their roles.
  • Simulation test: this is the most realistic test, requiring team members to perform their BC/DR duties within their actual work environments. For certain types of disasters, this may even mean going off-site (for example, to resolve issues at a local data center or mock-prepare a backup office location).

Full-scale simulation tests are ideal because they allow you to evaluate your teams’ and technologies’ response to disasters in a way that’s as close to the “real thing” as possible. But if time and resources don’t allow for repeated simulations, then fall back on the tabletop tests (rather than not testing at all).

Okay, let’s dive into the tests …

1) Data loss

Let’s start with one of the most common workplace disasters today: a loss of data. This loss could be caused by a number of culprits:

  • Ransomware and other cyberattacks
  • Accidentally deleted files or folders
  • Server / drive failure
  • Datacenter outage

Assume that the lost data is mission-critical. Perhaps it’s your CRM information or the data that runs your sales and logistics applications.

The obvious goal is to get that data back as quickly as possible, ideally by restoring a backup. But whose job is it to do that? How should they communicate the problem with other personnel (and at what point in the crisis)? What are the priorities? Do outside vendors, such as managed service providers (MSPs) need to be contacted?

If your primary IT person isn’t available to start the recovery, do other team members know how to do it?   These are all questions that should be answered by your test.

2) Data recovery

You need to make sure your BC/DR systems work like they’re supposed to. Conduct a test that involves losing a massive amount of data, and then try to recover it.

Here’s what you’ll need to evaluate:

  • How long does the recovery take?
  • Were any files corrupted during the recovery?
  • Did you meet your RTO?
  • If you virtualized a backup in the cloud, were there any issues? Did internal applications run without connectivity issues or lag?

Make sure that the teams who rely on this business-critical data participate in the test. For example, if they’ll be expected to work with a virtualized environment, watch them do this – see what questions they have or what issues they run into.

3) Power outage

Scenario: Last night, power was knocked out by a storm. The utility company says it won’t be back up for days.

So, what now? What does your BCP say should happen in an event like this?

As part of the test, you’ll want to make sure that your DR team knows their responsibilities and how to communicate with the rest of the organization.

  • How will personnel be notified? Are they expected to come to work?
  • If a prolonged work stoppage occurs, does HR and Accounting know how it impacts payroll?
  • Are there backup generators that need to be manually started?
  • Is there a backup office location?

These answers should already be in your BCP. But with the test, you’ll be able to confirm that everyone follows the protocols as outlined.

4) Network and/or Internet outages

Very similar concerns here. Chances are if there’s no electricity then there’s no network either. Although there are numerous situations in which you could have electricity but the network is down.

For situations like this (if the outage is prolonged), it’s increasingly common for organizations to provide personnel with the means to work remotely from home (more on that in the next continuity plan testing scenario, below). So as part of this test, you’ll want to make sure that this plan works as designed:

  • Do employees know how to use/access the remote desktop systems?
  • Does the technology work as designed? Are speeds/connectivity strong enough to maintain productivity levels?
  • How is the network being restored? Do recovery teams know what to do?
  • What about network tests?

In addition to testing your preparedness for a network outage, you’ll want to test the network itself. This will enable you to verify the resilience of the network in various scenarios, such as cyberattacks, heavy bandwidth usage, changes in network configurations and so on.

There are numerous types of network stress tests that allow you to simulate congested network conditions. Sometimes referred to as “torture testing,” these tests give you insight into how your network performs when stressed to the max. Most network testing tools will allow you to measure bandwidth utilization and latency, and see how spikes in packet levels affect the performance of your network devices.

In addition to routine testing, these tests should also be conducted prior to the rollout of new applications or other significant changes to the network.

Remember: a critical aspect of these testing scenarios is to test your response to the simulated incident. So in a simulated network outage, for example, you’ll want to run through the steps needed to resolve the problem. Then, conduct a post-incident analysis to measure the speed and effectiveness of that response.

5) Application failure

What happens when an application that is most critical to your operations suddenly stops working? Aside from bringing your operations to a halt, your employees will likely be idled with nothing to do. This is an extremely costly scenario for most businesses, because it means that revenue is halted while expenses continue (and are wasted).

Routinely testing your applications can help to prevent these costly outages from happening and ensure that teams know how to rapidly respond when failure does occur.

Here’s what to consider as part of this testing scenario:

  • What events or conditions are most likely to cause the application to fail? (i.e. heavy network usage, large-scale changes, etc.)
  • When failure occurs, what steps are needed for recovery?
  • What can be done to mitigate or eliminate these outages in the future?

Stress tests and performance tests are especially valuable, as they can help to identify how the application performs under different workloads. If the applications are externally developed and there are bugs or other issues inherent in the software (as opposed to adverse internal conditions, such as network issues), then organizations should work with their software vendor to identify a fix.

6) Public health crisis

This is a larger-scope continuity testing scenario that businesses became well-acquainted with during the Covid-19 pandemic.

As the coronavirus spread, organizations raced to adhere to critical health guidelines that ushered in a new era of remote work, virtually overnight. Not all businesses were able to quickly adapt to this sudden shift. However, some organizations had been testing such a scenario as part of their continuity planning long before the pandemic started.

Businesses of all sizes need to be sure they can continue to operate during a public health crisis that threatens the health, wellness and availability of workers. This means testing the ability to shift operations, as it relates to both logistical feasibility and IT infrastructure:

  • Can employees perform their jobs remotely?
  • Do they already have devices that make remote work possible? Or would new devices need to be acquired?
  • Are IT systems already in place that would enable workers to securely connect to the network?
  • In the event of prolonged staffing issues, can critical operations be carried out by limited personnel?

As we discovered, a global health crisis can occur at any time. Businesses need to continually test their ability to adapt to such an event to ensure their operations can continue without interruption.

7) On-site danger

This is a very important office-wide drill that you must conduct at least once a year. Chances are that your local fire codes may already require you to have a periodic fire drill. If not, it’s critical that you conduct one anyway.

In addition to fire, these drills can be used for testing response to other dangerous situations, such as:

  • Earthquakes
  • Tornadoes
  • Bomb threats
  • Terrorist attacks
  • Gas leaks
  • Structural instability

As part of your test, make sure people know their emergency procedures, whether it’s evacuation, duck and cover, retreating to a safe area or even staying at their desks.

Additionally, you should be testing your procedures for maintaining operations in case such an event is prolonged.

8) Communication protocols

Communication is critical in a disaster. And in the most disruptive events (such as a severe natural disaster), you’ll probably lose most of your traditional communication means.

Your BCP should already outline how communication should occur in these situations: who should call whom and how. Some companies use calling trees. Some have an emergency email alert system, a call-in number for updates or special company websites used exclusively for communicating during these events.

Your tests should check that these systems and steps actually work: that personnel know they exist, that they know how to use them and that they work as designed.

9) Crisis of any kind

Let’s face it—there are so many different disasters that threaten your operations. Hopefully they’re already thoroughly defined in your business continuity plan.

Your job is to make sure you’re creating realistic tests that prepare the business for each of these crises. We’ve included some of the most destructive (and common) disasters in the recommended tests above, but there are numerous others to consider as part of your testing, including:

  • Loss of personnel (transportation blockage, strike, illness, etc.)
  • Additional utility outages (gas, telecommunications)
  • Application outages
  • On-site flooding
  • City/area-wide evacuation
  • IT infrastructure failure or damage

As with each of the tests outlined above, your drills for these scenarios should be designed to ensure that personnel know how to respond, that they’ll be safe and that the business can continue running.

Documenting your testing scenarios

All tests should be thoroughly documented. This enables organizations to identify how the test was conducted, what went right and what needs to be improved. Each test provides a baseline for conducting future tests and also for making changes to continuity planning.

Each testing scenario should be individually documented, but can also be summarized to provide a high-level overview. Here is a very basic example of what that might look like, just for template purposes:

 

Testing Scenario  Outcome  Action Steps
Local data backup recovery Failure to restore; corrupted data Further evaluation of the cause of failure/corrupted data; consideration for new BC/DR investment
Network stress test Application failure at peak bandwidth utilization Reconfigure network settings to balance network load

In conclusion, this summary should be followed by a more detailed description of each test, when it was conducted, what occurred and recommendations for further testing and/or improvements.

This article has been excerpted with permission and was originally published at Invenioit.com  Visit to learn more about business continuity plans and testing.

Recommend0 recommendationsPublished in Enterprise Resilience

Share This Story, Choose Your Platform!

About the Author:

Dale Shulmistra is the co-founder of Invenio IT, an award-winning managed service provider that specializes in data protection services. With over 20 years of experience in information technology, Mr. Shulmistra is an established thought leader in the data protection space, co-authoring books as well as contributing to articles for: Forbes, Bloomberg, Fox Business, and numerous trade publications. Dale is passionate about technology and using it to solve complex and evolving business problems for his clients.   Reach out to Dale on LinkedIn https://www.linkedin.com/in/daleshulmistra/

Leave A Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.