Upcoming Webinar
Cayosoft Shows How Forest Recovery Tech Addresses Survey Issues
Join this webinar and not only learn about what your peers are doing, but also learn about a new patent-pending modern approach to AD forest recovery from Cayosoft.
New episode!
The Scoop on Loop: The Latest Innovations Directly From Microsoft!
Darrell Webster speaks to Rebecca Keys, a Program Manager from the Microsoft Loop team.
Turn Data Protection into an MRR Growth Engine
Security for your customers, revenue for your MSP practice. Access key insights on how to scale your practice with SkyKick’s new eGuide.
MSP eGuide to Package, Sell and Deliver M365 Security Services
Based on input from our top-performing MSPs, SkyKick prepared the MSP Guide to help you navigate the security landscape.

What Are Hyper-V VM Failover Cluster Group Sets?

In this post, I will explain how you can orchestrate and order the failover of virtual machines in a Windows Server 2016 (WS2016) Hyper-V cluster using Cluster Group Sets.

The Need For Ordered Failover

Let’s assume that you have a line-of-business application, called LobA that is running as a set of virtual machines on a Hyper-V cluster. LobA is made up of several tiers:

A web virtual machine that depends on an application virtual machine
The application virtual machine that depends on a database virtual machine
A database virtual machine

Let’s assume that all three virtual machines are running on a Hyper-V host. This host has a genuine fault and the cluster initiates a failover. Without any form of orchestrated failover, the three virtual machines will start at the same time. The application server will come online before the database server and fail. The web server might be online before the application server and end users will start reporting failed attempts to use the service.

A Workaround

WS2012 gave us an ability to prioritize virtual machines. In a way, this gave us some hack for solving the above problem. However, we only have 3 priorities of low, medium, and high, which was not what the feature was intended for. We really use priorities for:

Prioritizing resources for more important virtual machines when there is RAM contention after a failover
Optionally using Quick Migration instead of Live Migration for lower class virtual machines to protect bandwidth.

The truth is that we needed something better.

Cluster Group Sets

Before we get started, keep in mind that Failover Clustering treats virtual machines as cluster groups. It is treated as a set of linked resources that make up a virtual machine.
We can create a set of groups in Failover Clustering with each group containing virtual machine. Let’s expand the concept using LobA for the example:

8x web servers
4x application servers
2x database servers

A Failover Clustering set containing virtual machines or cluster groups [Image Credit: Microsoft] — *A Failover Clustering Set Containing Virtual Machines or Cluster Groups [Image Credit: Microsoft]*

We can create a set that contains the database servers. We can also create a set that contains the application servers and make this set dependent on the database servers. Then, we can create a set for the web servers and make this dependent on the database servers. If a failover occurs, the cluster will automatically ensure that required sets are up and running before a failover occurs.

What Counts As Started?

What does the cluster count as “started” in a set? The database server set from LobA is probably a SQL cluster. If one of the virtual machines is started, then the database is running. That means we have enough to get the application servers going.
We can customize a dependency between sets:

All the virtual machines in a set must be started.
A certain number of the virtual machines in a set must be started.

When do we start the application servers? Do we wait 30 seconds? Do we assume that the database is responsive? A set can be configured to delay start-up for a configurable amount of time or to wait until all or some of the virtual machines in the required set are running. Note that “running” means that the heartbeat integration services in those virtual machines are reporting a healthy state to the hypervisor. This means that the guest operating system is running.

Global Dependency

Cluster group sets support the concept of an infrastructure group. This is a set of virtual machines that are required by more than one line-of-business application. You could model dependencies. You could also say that these machines are required by everything else.
For example, we might have LobA, LobB, and LobC. All of these could require Active Directory to be running. We can put our virtual domain controllers into a cluster group set and mark this set as being globally required.

Now, all other cluster group sets will require the domain controller set to be running before they start. We do not have to create a dependency for all of the other sets.