In this post, I will explain how you can orchestrate and order the failover of virtual machines in a Windows Server 2016 (WS2016) Hyper-V cluster using Cluster Group Sets.
The Need For Ordered Failover
Let’s assume that you have a line-of-business application, called LobA that is running as a set of virtual machines on a Hyper-V cluster. LobA is made up of several tiers:
- A web virtual machine that depends on an application virtual machine
- The application virtual machine that depends on a database virtual machine
- A database virtual machine
Let’s assume that all three virtual machines are running on a Hyper-V host. This host has a genuine fault and the cluster initiates a failover. Without any form of orchestrated failover, the three virtual machines will start at the same time. The application server will come online before the database server and fail. The web server might be online before the application server and end users will start reporting failed attempts to use the service.
WS2012 gave us an ability to prioritize virtual machines. In a way, this gave us some hack for solving the above problem. However, we only have 3 priorities of low, medium, and high, which was not what the feature was intended for. We really use priorities for:
- Prioritizing resources for more important virtual machines when there is RAM contention after a failover
- Optionally using Quick Migration instead of Live Migration for lower class virtual machines to protect bandwidth.
The truth is that we needed something better.
Cluster Group Sets
Before we get started, keep in mind that Failover Clustering treats virtual machines as cluster groups. It is treated as a set of linked resources that make up a virtual machine.
We can create a set of groups in Failover Clustering with each group containing virtual machine. Let’s expand the concept using LobA for the example:
- 8x web servers
- 4x application servers
- 2x database servers
We can create a set that contains the database servers. We can also create a set that contains the application servers and make this set dependent on the database servers. Then, we can create a set for the web servers and make this dependent on the database servers. If a failover occurs, the cluster will automatically ensure that required sets are up and running before a failover occurs.
What Counts As Started?
What does the cluster count as “started” in a set? The database server set from LobA is probably a SQL cluster. If one of the virtual machines is started, then the database is running. That means we have enough to get the application servers going.
We can customize a dependency between sets:
- All the virtual machines in a set must be started.
- A certain number of the virtual machines in a set must be started.
When do we start the application servers? Do we wait 30 seconds? Do we assume that the database is responsive? A set can be configured to delay start-up for a configurable amount of time or to wait until all or some of the virtual machines in the required set are running. Note that “running” means that the heartbeat integration services in those virtual machines are reporting a healthy state to the hypervisor. This means that the guest operating system is running.
Cluster group sets support the concept of an infrastructure group. This is a set of virtual machines that are required by more than one line-of-business application. You could model dependencies. You could also say that these machines are required by everything else.
For example, we might have LobA, LobB, and LobC. All of these could require Active Directory to be running. We can put our virtual domain controllers into a cluster group set and mark this set as being globally required.
Now, all other cluster group sets will require the domain controller set to be running before they start. We do not have to create a dependency for all of the other sets.