Anytime I do a presentation on Azure Site Recovery or Azure Online Backup I am always asked the same question: “How much bandwidth is required for replication?” The answer is always the same, “My crystal ball is broken.” I have no idea how much bandwidth is required for any customer. There is no easy answer for small businesses, medium-sized businesses, or large enterprises. Every company is different; some generate lots of new data every day and some create very little.
So if you are looking at Azure Site Recovery or any other disaster recovery replication solution, then you’re going to have to do some work to calculate your bandwidth requirements.
Asynchronous Replication Benefit: Latency isn’t an Issue
A benefit of asynchronous replication is that distance, or latency, is not an issue. Azure Site Recovery for Hyper-V is based on Hyper-V Replica (HVR). A VM will write data to an on-premises virtual hard disk. That modification is logged and sent to Azure based on your preconfigured protection group replication interval. This means that you can replicate over very long distances, which is something that synchronous replication cannot do, as latency degrades production service performance when using synchronous replication to the point of breaking systems due to timeouts.
Any disaster recovery (DR) expert that I’ve listened to has always recommended using dedicated links for DR replication. That certainly makes sense for traditional private DR sites when customer owned site A is replicating to customer owned site B. But when you are replicating to Azure, maybe one connection would do? I’m not sure about this. In theory, you could implement (to the Internet) QoS to ensure that replication gets sufficient bandwidth, but it is capped so that other services do not suffer. Try creating rules for HTTPS traffic sourced from your host. This QoS will probably have to be applied before any edge proxy.
Estimating Bandwidth Requirements
I came up with a method for estimating the bandwidth requirements of Hyper-V Replica a few years ago, and it should also work for Azure Site Recovery. The issue is that we need to understand how much change there is within a replication interval and ensure that there is enough upload bandwidth to complete a replication within the allotted time frame.
In my example, a small-to-medium enterprise (SME) is going to replicate to Azure. They have decided that they want an RPO of 15 minutes. This provides them with a lengthy replication interval of 15 minutes.
My method needs to identify how much data change there is per day. If the company is doing a traditional backup then check the backup history to see how much incremental backup data is stored every day. You might want to go back as far as you can and find a higher number. My SME is creating 2 GB (2048 MB) of data during a working day.
How long is that working day? SME works from 9 am until 6 pm, which is a nine hour day. Now we have enough information to do some math.
There are four replication intervals per hour (60 / 15). There are nine hours in a day, so there are 36 (9 * 4) intervals in a day. We need to allow for 2048 MB of data change in nine hours. In a replication interval, there will be an average of 57 MB (2048 / 36, rounded up) of change. Not all intervals are equal, so I’m going to add 50 percent to that to get a busy period, giving me 86 MB rounded up.
So we need enough bandwidth to replicate 86 MB within a 15-minute window. Find a bandwidth calculator and use it to determine the bandwidth requirements. A quick search brought me to numion.com. There I found that I will need a 1.024 Mbps of reserved upload bandwidth to replicate this data in less than 15 minutes (11 minutes, 11 seconds), something that should be achievable for most (not all, admittedly) SMEs.