Disaster Recovery: To Test or Not to Test?
How do I test? Let me count the ways…
When most customers think of a DR Test, their first thought seems to be, “We must test as if we are experiencing a disaster.” Although this approach is the most common, it is not necessarily the best. While testing certainly should be a time to prepare for the worst-case scenario, it is also a good time to understand the Who, What, When, and Where of the actual Disaster Recovery strategy. Understanding these basics will not only help you test more efficiently, but more importantly, it will help you recover in the event of an actual disaster.
Why do we care about the “who” in testing for DR? Many companies make the mistake of assuming that every team member will be available to fill his or her role during a disaster, when in reality, resources will most likely be limited. I can think of two examples that should serve as lessons to all.
The first is the most recent—Hurricane Sandy. While I was working for a different DR Service Provider, a customer declared a disaster, only to discover upon arrival at the Recovery Center that their System Admin for a particular platform was trapped at home by flood waters. The customer was left scrambling, trying to figure out how to recover the systems usually taken care of by that particular System Admin.
The second was a result of 9/11. When I was working for that same DR Service Provider out of Scottsdale, AZ, I received a call the morning of 9/12, and the customer on the other end was in absolute hysterics. They were the West Coast office for a small company based in the World Trade Center (WTC). The IT department was in the WTC, along with the entire East Coast office. No one in their East Coast office survived. They lost everyone when the towers collapsed. They had absolutely no idea of what to do to start their recovery in order to get their business back up and running. Regardless of how important your information is to keep your business moving forward, people are your greatest asset.
You might be wondering, “If I’m not testing as if an actual disaster is happening, then how do I test?” With over 16 years of experience in the DR industry (and over 18 in IT), I have seen testing done in many different ways. The good news is that none of these ways is wrong! Some customers want to test one particular platform from their data center at a time. Some customers want to test subsets of their overall hardware configuration. This is a great approach, as practice always makes us better. Other customers may use test time to pre-test an upgrade or fix they are getting ready to implement at home. This gives them an opportunity to do a “live” run of the installation before they take it to production. Another example of testing does not involve the IT hardware at all. Sometimes it is a good idea to do a “paper” test. This is an exercise to walk through all recovery steps and documentation that would be required at time of disaster so that the customer can be confident that they have everything they would need to recover in the event of a disaster.
When is the best time to test?
That question is going to be different for everyone. Testing is recommended when any major changes take place that affect the IT organization or company, such as hardware and software upgrades or platform changes. Some companies are required to do audits every year, so they need to test prior to the audit. In a perfect world, the test would be a couple of months prior to the audit, so in the event there is an issue, the organization has time to correct it prior to the actual audit.
Where should you test?
In the past, there was no option but to go to the Recovery Center for DR testing. Advances in technology over the last 20 years have created numerous new possibilities. Today, with the network options available, customers can test from almost anywhere. One of the most common new models is for non-replication customers to send their tapes to the Recovery Center, where a recovery specialist loads the tapes and sets up a basic configuration on the hardware, and the customer tests completely remote from their own center. Many customers still like to be present at the Recovery Center during a test, as this is how they would operate during an actual disaster. Other customers use a hybrid of these two options—they send their IT DR team to the Recovery Center to handle the recovery while the end users remain home and test remotely.
As you can see, as technology has evolved, so have the means for DR testing. Whether taking advantage of the latest methodology and leveraging the latest technology or simply doing a paper walk-through and documentation check, testing is an important factor in being able to recover at time of disaster. The time invested in DR testing will only bring great returns when you are able to continue business in the event of an actual disaster.