Disaster recovery preparedness in virtual environments

DR can be a complex undertaking, and when you face differing hardware between production and recovery sites, recovering critical systems is often fraught with numerous roadblocks. Most of these problems result from differing hardware and, hence, differing firmware and device driver requirements for production versus disaster recovery sites.

But beware of the time-sink excuse. DR is a serious issue, and any organization that doesn’t have confidence in its disaster recovery plan is taking a serious gamble. That’s why virtualization is so important. The hardware abstraction of virtualization platforms makes it relatively easy to overcome differences in hardware between production and recovery sites.

Virtualization changes how we plan for and execute a disaster recovery strategy. With virtualization, you can reduce the number of disaster recovery site hardware requirements — assuming that you’re preparing for the loss of only a single site. The hardware abstraction afforded by virtualization removes the need for duplicate hardware between production sites and the recovery site.

When it comes to disaster recovery, data center managers have plenty to worry about. For organizations that virtualize production resources, they will likely need to adjust disaster recovery plans after a server virtualization migration. How does virtualization change things? What should you do differently? The six items below are considered requirements for disaster recovery in virtual environments:

Hypervisors such as VMware ESX Server and Citrix Systems Inc.’s XenServer store configuration settings locally on their physical host system. Configuration settings determine how the hypervisor accesses compute, storage and network resources and also dictate how shared resources are presented to VMs.

In virtual environments, two of the most critical configuration settings are the virtual network and storage configurations. Many organizations have scripts in place that can recreate critical hypervisor configuration settings; however, in the absence of configuration scripts, you need to ensure that all configuration settings are regularly backed up. There are two common techniques for doing so:

Many organizations back up VMs by installing backup agents inside each virtual machine’s guest operating system. While backup agents inside a VM’s guest OS secure copies of the VM’s data quite well, agents won’t back up the configuration data that is external to the VM’s guest OS. VM configuration data is stored in each VM’s associated configuration file (such as a .vmx file with VMware or a .vmc file with Microsoft).

VM snapshots should be considered a requirement for DR preparedness and should be a part of your change control processes. So each time a change occurs to a VM — as with a configuration change, patch installation or software installation — you should create a new snapshot of the VM and immediately replicate it to your DR site. VM snapshots should serve as the baseline for all disaster recovery operations. Since at a minimum the snapshot includes the VM’s most recent OS and application configuration, you need to restore only the most recent data files from backup to fully recover the VM. Of course, if you’re using asynchronous replication to synchronize the data between both production and DR sites, then you just have to power on each VM at your recovery site and you’re all set. (For additional information on using a storage architecture to capture VM snapshots and using replication for DR preparedness see my SearchStorage.com article on using storage replication for virtual machine disaster recovery preparedness.

In traditional physical server disaster recovery, detailed system configuration information — hardware requirements, storage requirements, partition configurations, etc. — is crucial in order to fully rebuild production systems. In VM recovery, VM configuration details are secured by backing up each VM’s configuration files. So recovering VMs at the DR facility is much easier than recovering a physical system, especially if the hardware at the DR facility is not the same as the hardware at the production site. (For more on VM staging at a DR facility, read my SearchServerVirtualization.com article.)

Firmware documentation is an often-overlooked element of DR preparedness. Any firmware updates at a production site should also be applied to devices at the recovery site. Otherwise, differences in device drivers and firmware revisions between production and DR facilities may prevent physical host systems from successfully starting in the event of a disaster.

You’ll want to ensure that backup media, all necessary software, detailed recovery procedures and detailed network diagrams are available at the recovery site in order for the DR facility’s local staff to more easily troubleshoot any problems that occur as VMs are brought online.

One of this year’s major themes is data center automation, and automation tools have extended to disaster recovery procedures. If you manage VMware-based VMs, keep an eye on VMware Site Recovery Manager (SRM). With SRM, you can automate your disaster recovery plan with software, initiate that plan with a mouse click, and pre-program the sequence in which VMs are brought online at a disaster recovery site. During the course of this year, I expect other vendors to offer similar technologies as well.

Virtualization really transforms how we look at disaster recovery and requires substantial modifications to a DR plan to reap the various benefits of virtualization. With tools such as PlateSpin Ltd.’s PowerConvert, you can even use virtualization to stage recovery VMs for physical production systems that have yet to be virtualized.

As a long-term strategy, use virtualization to provide a fully automated and easily testable disaster recovery plan. And soon, hopefully all organizations that abandoned DR testing because of its demoralizing effect on IT will return to the fold and fully test a DR plan. If DR falls into your area of responsibility, knowing that your DR plan actually works should allow you to sleep better at night.

Chris Wolf is a senior analyst at Midvale, Utah-based Burton Group and the author of several IT books. Check out a chapter on backup from Wolf’s book, Virtualization: From the Desktop to the Enterprise. In a recent SearchServerVirtualization.com podcast, Wolf shared additional tips on best practices and tools for virtualization-based disaster recovery.