Redirected I/O in Windows Server 2012/R2 Cluster Shared Volumes
Simiply mention “redirected I/O” if you want to make a Windows Server 2008 R2 (W2008 R2) Hyper-V administrator shiver in dread. A necessary feature in W2008 R2, redirected I/O was used during backups operations of Cluster Shared Volumes (CSV) and had all sorts of nasty effects in more complex environments. Windows Server 2012 (WS2012) introduced CSV 2.0 and dispensed with the need for redirected I/O during backup. In this article, I will explain how redirected I/O has changed and how you can control which networks it is used on.
What Is Redirected I/O?
Windows has a “shared nothing” approach to volumes. Any one volume can be owned by just a single machine. This could be a problem when trying to create a cluster file system. CSV was introduced in W2008 R2 to allow the nodes in a Hyper-V cluster share a volume and simply storage design. Shared nothing still applied – one node in the cluster is the owner or Cluster Shared Volumes coordinator of that volume. This is a fault-tolerant role that is managed for you by the cluster.
Normally each node in a cluster has direct I/O to a CSV (delegated by the owner) on the SAN. But there are scenarios in which the traditional shared nothing approach must be restored. This causes something known as redirected I/O to occur. Each node must continue to use the storage (virtual machines must continue to operate) but the I/O must be controlled by the CSV’s owner. To make this possible, each node that is not the CSV owner temporarily starts sending/receiving all storage traffic to the SAN via the CSV owner. This traffic traverses the private networking of the cluster.
Non-owner nodes redirecting I/O to the Cluster Shared Volumes via a CSV owner.
Redirected I/O obviously is slower than direct I/O. And redirected I/O had severe effects on storage performance for multi-site or stretch clusters.
What Causes Redirected I/O?
Redirected I/O had three reasons to exist in W2008 R2.
- Metadata operations: These are brief operations such as a VM start, file creation, and so on. These operations happen infrequently because CSVs contain relative few files (VM files) when compared to normal volumes.
- Storage path fault tolerance: If a node has a HBA failure (breaking both MPIO-managed channels) then it could cause of all its services (VMs) to fail. Instead, redirected I/O provides continuous storage access, albeit with lesser performance.
- Cluster Shared Volumes backup: W2008 R2 did not have the ability to create a single synchronized VSS snapshot of a CSV. This lead to lots of redirected I/O activity for extensive periods during backup operations. Some storage systems could be stressed to the point of crashing the W2008 R2 Hyper-V cluster.
Thankfully, CSV in WS2012 (known as CSV 2.0) simplified backup operations so that redirected I/O is no longer required for backup operations. That leaves us with the very infrequent and brief metadata operations, and the (very helpful) storage path fault tolerance solution.
Redirected I/O in Windows Server 2012
Just to be clear, redirected I/O is no longer required to back up a Cluster Shared Volume since WS2012. There are, however, some other significant changes that are not well known.
Redirected I/O Uses SMB 3.0
In W2008 R2, redirected I/O traffic passed over the cluster network with the lowest metric. That changed in WS2012. WS2012 uses SMB 3.0 for redirected I/O. This gives redirected I/O the best possible performance thanks to SMB Direct (if RDMA-capable NICs are used in the cluster) and, importantly, via SMB Multichannel.
SMB Multichannel is going potentially to use any NIC it can find between the non-CSV owners and the CSV owner, flooding the network with unmanaged redirected I/O traffic. You can control which NICs are used by SMB Multichannel by using the New-SMBMultichannelContraints PowerShell cmdlet. Typically (though this depends on your network design) you will limit SMB Multichannel to the cluster’s private networks.
Block-Level Redirected I/O
Another change to redirected I/O improves the performance of storage path fault tolerance. Block-level redirected I/O avoids sending storage traffic through the storage stack twice: on the non-owner node and on the CSV owner node. This requires no configuration.
Block-level redirected I/O.
Things to Remember
There are two essential things to remember when deploying CSV on WS2012, not just for Hyper-V, but also for other clusters such as the Scale-Out File Server.
- Redirected I/O is not used for backup.
- SMB Multichannel is used by redirected I/O and you should manage this by restricting the possible NICs.