Simplifying Testing

By Jon Toigo|2022-03-29T19:30:34+00:00January 1st, 2009|0 Comments

Business continuity planning efforts are made or broken by the testing program that is implemented to validate strategies, rehearse those who will play a role in recovery, and provide essential input to change management processes.

However, numerous surveys conducted over the past several months continue to demonstrate that plans are not being tested in a rigorous or meaningful way.

Most recently, AT&T’s survey of 100 firms in the Chicago area with revenues over $10M found that the number of companies undertaking a continuity planning project had increased by 15% from the previous year to 75. However, of those companies surveyed, only 43% had fully tested their plans within the last 12 months (an improvement over the 37% that did so in 2007) and almost one-fifth admitted they have never tested their business-continuity plans (up slightly from 10% in 2007).

A couple of months earlier, Forrester Research and Disaster Recovery Journal surveyed 250 planners to find that 50 percent tested their plans annually and another 18 percent even less often. Of the same “DR-savvy” respondent pool, 48 percent copped to updating their plans once per year or even less frequently.

Given the fast pace of business and technology change, it is inconceivable that companies can maintain the solvency of plan strategies with such infrequent testing or update patterns. In a quest to find the explanation for our collective failure to test and update, interviews with planners have posited these insights.

First, traditional testing is costly, resource intensive and time consuming. These are exactly the attributes that run afoul of the current cost-containment and do-more-with-less mantra that has taken hold in contemporary business today in the face of a recessionary economic climate. This partly explains why testing and update isn’t happening.

Second, new technologies, especially server virtualization and storage thin provisioning, have erected new barriers to effective testing. Our visibility into servers and storage is hampered by both technologies in their quest to simplify resource and capacity management. This may or may not be good for efficiency in production environments, but these technologies and others obfuscate the view of burgeoning error conditions and confound troubleshooting efforts when problems arise.

Thirdly, valid questions are being raised about the efficacy of carefully planned non-linear test regimes. On the one hand, the case can be made for a Heisenberg Effect that results from the meticulous pre-test planning that is usually a part of a traditional test. Critics argue that pre-planning skews test results and makes companies miss the small issues that will likely become big problems in an actual recovery effort. Less theoretical is the problem of non-linearity. To get maximum value out of limited test time, firms try to test a set of tasks in each exercise that have no interdependencies. That way, a setback in one test task will not interfere with the execution of other test tasks or otherwise impair the productive use of limited testing time. Non-linear testing, however, would seem to have the negative impact of limiting the efficacy of testing as a rehearsal for team members. It may be asking a lot of your teams to figure out the interdependencies of the work they will be called upon to perform in an emergency based upon a bunch of non-sequential and piecemeal task-oriented exercises.

We could argue the merits of these observations, which were collected “scientifically” (aka over lunch) from practitioners attending a conference in Chicago recently. However, the better course of action might be to accept them on their face and to do something about them. In other words, this is a good time to consider “aggregators” and “wrappers” to augment the traditional testing process.

An aggregator is a software product that collects information about all data replication processes associated with all applications or business processes and presents the data using a dashboard-like graphical interface. The data about data replication is correlated with time to data objectives (RTOs) associated with the application or process, and with known capabilities and parameters of the data replication processes themselves.

In the case of traditional tape backup and restore, the aggregator monitors to ensure that backups are happening and that the time required to restore data, given the technology that is in use and the volume of data being backed up, does not exceed time to data requirements embedded in the recovery strategy. If the volume of data changes in a way that compromises restore timeframes, an alarm is generated.

In the case of data mirroring processes, the aggregator performs on-going checks to see whether the data you think you are copying is in fact being copied to your shadow infrastructure. This is very important given the lack of visibility most vendors provide into their proprietary on-array mirroring schemes.

Aggregators, like Continuity Software’s RecoverGuard, aggregate and provide passive monitoring that may be viewed as continuous testing of data protection. Wrappers, on the other hand, take matters to the next level.

A wrapper not only monitors, but also manages, data replication processes, whether these are spawned by application or operating system software, third party replication software, or hardware-based replication processes. Some wrappers, like CA XOsoft, provide their own data replication engine that can be used in place of third party services, and also integrate tape backup into their monitoring capability. These products are called wrappers because they portend to enable failover between the production environment and the recovery environment at a price point that even smaller companies can afford. By keeping scrupulous track of data replication and providing additional capabilities to facilitate load shifting to the shadow infrastructure, products like CA XOsoft, Neverfail, DoubleTake and a few others, provide a geographically dispersed ”failover cluster” capability that works with most enterprise software products and typical equipment configurations right out of the box.

Most wrappers provide the capability to create “scenarios” to guide failover (think continuity strategies for application hosts and storage infrastructure), which can be tested at any time in both simulation and actual failover modes. Failover itself can be performed manually, interactively or automatically. Companies have begun using wrappers for other purposes during non-disaster periods – like eliminating planned downtime by failing over to shadow infrastructure while production systems are taken down for maintenance. The technology has a great dual use value proposition to offer.

Conclusion
Using products such as wrappers can cut test tasks down to size, revitalizing a continuity plan testing program and enabling planners to refocus their attention on tasks that do not involve business application re-hosting. However, it is very important that aggregator and wrapper products be tested thoroughly and that the nuances of their capabilities (and limitations) be understood. One product may seem to be very much like another, but there can be significant differences that could limit the efficacy of a particular product within the context of the specific processes and apps that a planner is seeking to protect.


About the Author
Jon Toigo is CEO of Toigo Partners International and a consultant who has aided over 100 companies in the development of their continuity programs. Feel free to contact him at [email protected].

Recommend0 recommendationsPublished in IT Availability & Security

Share This Story, Choose Your Platform!

About the Author: Jon Toigo

Leave A Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.