This post will explain a new high-availability and service level agreement (SLA) feature of Microsoft Azure called availability zones.
Availability Before Availability Zones
Today, if you want an Azure virtual machine with an SLA, then you must deploy the virtual machine in one of two ways:
- Single virtual machine: The machine must be deployed only with Premium (SSD) storage – no Standard (HDD) storage – to qualify for a 99.9 percent SLA. There is something about the nature of Premium Storage clusters in Azure that enables Microsoft to offer better availability for IaaS workloads.
- Availability Sets: Similar to the concept of anti-affinity in on-premises virtualization clusters, this tool spreads machines that share responsibility for performing a particular task across multiple fault domains and update domains inside a computer (Hyper-V) cluster. If you deploy machines in valid availability sets, the service that the machines perform qualifies for a 99.95 percent SLA.
We need to remind ourselves a little about Azure theory before we go forward. Let’s say that you deploy a pair of virtual machines, in an availability set, into the East US Azure region. This region is made up of numerous data centres (Microsoft never states how many data centers are in each region). Your virtual machines are deployed into a single data center. That data center can give you so much high availability, but every single storage cluster and compute cluster within that building share common points of failure such as fire suppression, power, and networking. You can deploy as many availability sets, guest clusters, and service availability architectures as you want, but your service will only be as highly-available as that single building.
For almost everyone, I suspect, 99.95 percent is going to be enough for service availability. In the event of a data centre being lost, we can always fail virtual machines/services over to another region, assuming that we have replicated databases, virtual machines, and so forth.
Myth-busting: Neither Microsoft or AWS automatically replicate your stuff to another region. If you want regional fault tolerance, then you have to deploy that system and pay for it.
What if you want more? What if 99.95 percent is not enough and you want four nines (99.99 percent) in a single region? That wasn’t possible in Azure, but Microsoft is starting to roll out a new offering called availability zones.
As I stated before, an Azure region is made up of several data centers. Each data center, or group of data centers, has its own resources such as power and networking. Each set of independent data centers, or group of data centers, that does not share resources with others is called an availability zone.
You will be able to deploy services into a region and select which availability zone that those services go into. This will ensure that if one availability zone goes dark, your service(s) remain online in another availability zone, qualifying you for a 99.99 percent SLA for the service.
It sounds simple but this will impact your architecture and spend. You cannot just throw out two machines and spread them across two availability zones. These machines are in different buildings, so there will be access/performance/architecture problems by assuming that one can go simple.
Instead, you’ll deploy copies of each service into a different availability zone, probably with availability sets in each availability zone, and then use Traffic Manager to aggregate the copies into a single load-balanced or failover set. It is a bit more involved than you might have thought but getting an extra 0.05 percent of availability does cost a lot of money once you go over three nines (99.9 percent), even if you try to do it on-premises.
A preview of availability zones has started in the West Europe and East US 2 Azure regions. Microsoft has promised that availability zones should appear in other US, Asia, and European regions before the end of the year, including France Central (not publicly available at the time of writing).