In this post, I will explain how a new preview feature, Azure Site Recovery for Azure Virtual Machines, can provide disaster recovery services for virtual machines that are running in Azure.
Disaster recovery is a hot topic. British Airways has had days of flight issues because of data center and power issues. AWS had a massive outage in one of its regions in February. WannaCrypt/WannaCry has made everyone think about dat availability, services availability, and security.
I wrote an article back in March that discusses how to avoid AWS-style outages when deploying virtual machine-based services in Azure. In that article, I explained that despite the myths and assumptions, neither AWS or Azure replicates virtual machines for you to another region. If that region goes offline, as has happened to AWS US East in Virignia many times now, then anything you have there will stay offline. No magic fairies are sprinkling dust to wormhole your machines and data to another online region.
This means that you have to deploy duplicate virtual machine builds, mirror the application installations and maintenance, enable inter-region connectivity, and replicate data from one region to another. In the event of an outage in a cloud service region, you need to have more than double the amount of effort and costs to ensure business continuity. Until now…
Azure-to-Azure Site Recovery
Microsoft has launched a preview service that allows you to optionally replicate virtual machines from one region to another close-by region. For example, virtual machines running in North Europe (Dublin) can be replicated to one of the following: UK West (Wales), UK South (London), or West Europe (Amsterdam).
This cluster of regions is referred to as a geographic cluster. The following clusters are available for replicating virtual machines in Azure:
The solution is based on mature technologies. The Azure recovery services vault, placed in another region from the production virtual machines, provides management and orchestration. A Mobility Service from InMage Scout virtual machine extension provides the replication functionality. Think of it as an Azure-managed virtual machine integration service. No, Azure is not using Hyper-V Replica. There are a few differences:
- Replication is continuous. Instead of being interval-based, it is still asynchronous for range and performance reasons.
- Replication is based on a filter driver in the guest OS or the Mobility Service. It is not a change log stored with the virtual hard disk.
As a result of the latter, there is a smaller compatibility list than you will find for private site-Azure replication when using Hyper-V hosts. The Mobility Service only supports a subset of Azure compatible OS today:
There are some notable missing OS:
- Windows Server 2016
However, this is a preview release, so things can change as we move toward general availability.
This new DR feature is an as-a-service option. This means that you require limited engineering to get things going. A simple recovery services vault and resource group in another region in the geographic cluster will do the trick. You can simply enable replication for each required virtual machine. Azure Site Recovery will create any required dependencies such as storage accounts, networks, and subnets. Names are based on the original deployment and a suffix is added. You can customize the naming.
A default replication policy is offered but you can customize it. Configuring how recovery points are kept and how many of those points are application consistent, are a few examples of how you can customize.
Some of the usual features of Azure Site Recovery are there to use. An important one is the ability to run a non-disruptive test failover. With this ability, you can test your failover and ensure that if a region fails, your business can survive in another region.
Azure veterans should already be aware that Azure Site Recovery offers customers the ability to “lift and shift” migrate virtual machines from VMware or Hyper-V to Azure. It is free of cost if the process is completed within 31 days, per virtual machine. You simply enable replication, perform a test failover, do a planned failover, commit the action, and strip away replication.
Azure-to-Azure Site Recovery will offer region mobility for customers that wish to relocate their services. For example, customers in the UK might choose to relocate from North Europe or West Europe might choose to relocate to one of the UK regions.