How to Deal with VMware VCB Multi-Path Issues
Recently, while using VCB in my production environment, I came across an issue with VCB & Windows. The issue is really two-fold. First, that VCB does not support multiple paths across a Fibre Channel (FC) SAN. Second, that Windows dynamically maps disk numbering and partitions when it boots. What this means to you is that if your VCB server has a “primary” and a “backup” path across the SAN to the VMware ESX VMFS volumes that you want to back up, VCB cannot deal with them and, even worse, after a VCB server reboot, VCB may no longer work. Read on to understand more about this issue and how to resolve it…
Multiple SAN paths in Windows
VCB runs on a Windows Server that has a SAN connection to your VMware ESX Server VMFS volumes. These volumes are the virtual machines that you want to back up to your VCB server as images (assuming you are doing image-level backups). With most SANs you will have two independent paths, through the SAN, to get to that data. You will have two FC controllers in the VCB server going to two FC switches and connecting to two independent FC controllers on the FC storage system. Thus, you have two fully redundant paths through the SAN in case one of those paths goes down.
VMware ESX Server understands this and deals with this without any trouble. Windows Server can also deal with this fine IF there are Window NTFS file systems on the SAN storage system and you have multi-path drivers for your FC controller. However, in the case of VMware VCB, there are no NTFS filesystems on the SAN LUN’s. Instead, there are VMFS volumes which Windows does not recognize. These can only be recognized by VCB.
Once VCB is functioning and if nothing nothing changes, there are no issues, however, if the SAN paths change or if the VCB server is rebooted and if VCB ends up pointing to the wrong paths, full virtual machine (image) backups (and other VCB backups) will fail with the following error message
Error: Failed to export the disk: The device is not ready.
File level virtual machine backup for Windows guests will fail to mount the file systems in the virtual machine’s disk images.
(for the official VMware KB article on this, see Limitations on VMware Consolidated Backup (VCB) and Multipathing)
If you rarely reboot your VCB machine, perhaps you will rarely have this issue. However, if you frequently reboot your VCB server, you may frequently be troubled by this issue. VCB does not get pointed to the wrong SAN path every time the VCB server reboots, just some of the time. Of course, if your VCB server say, 100 LUNs on the SAN, you could be forced to manually configure the SAN path every time you rebooted your VCB server.
As an alternative to the solution below, you may be able to filter out the “backup” SAN paths that VCB sees using a FC switch or FC controller masking feature.
One Solution to VCB multipathing issues
So what do you do about this? Well, let me share the solution that myself and my fellow VMware admins have used to “fix” this. I am not saying that this is the issue you will always have nor am I saying that this is always the “fix” that you will need to use. That is because many of these issues and the necessary resolutions are all based on your SAN configuration and that SAN configuration will vary based on your SAN hardware and design.
In my case, when VCB is “not working” after a reboot of the VCB server, the issue is that VCB is attempting to use the alternate SAN path to the VMware ESX VMFS volumes (where the guest operating systems are stored).
If you go into Windows Computer Management and go to Devices, under the Disk drives, you will see that SAN disks that are available. Notice in the graphic below, how the two middle disk drives have a RED X over the disk icon. This tells you that these disks are disabled in Windows device manager. This is actually how things should be configured, in my case, as there are two other identical IBM disks that are enabled. Those enabled disks are the primary SAN paths to the VMware VMFS data that we want to back up with VCB.
*Note: As you can see from the graphic, I have an IBM SAN. Your SAN disks will look different based on the type of SAN that you are using.
But how do I know that these are the RIGHT disks – the primary disks? If I go into Disk Management, I do see two 300GB disk partitions. These are the primary partitions.
But how do I know which is which? If you right-click on these disk volumes and select Properties, you will find that you can see the type of disk (IBM 1815 FastT SCSI Disk Drive in my case) and you can see the Bus, Target ID, and LUN for each of the disks. You should know which Target and LUN # are the primary and which are the backup.
If you reboot the VCB server, VCB may try to use different paths to the SAN. If it is using the backup paths, your VCB backups aren’t going to work. If I were to go and enable one of these backup paths and CREATE the issue, you would see one (or more) disk partitions in Disk Manager that say that they are uninitialized and Unallocated, like this:
You know that the Unallocated disk partition must be the backup path. Even worse, if you go into Disk Manager and it discovers one of these unallocated disk partitions, it will ask you if you want to initialize it by writing the Windows disk initialization information on it.
IF YOU DO THIS YOU WILL WIPE OUT YOUR VMWARE ESX SERVER GUEST OPERATING SYSTEMS!
My issues with VCB have always been that the VCB server sees alternate SAN paths than what it was previously using successfully before the reboot. To “fix” the issue, you just need to have the proper Disk devices (SAN paths) enabled and the improper disk devices DISABLED. You can very quickly resolve this by going into Device Manager in Windows.
This issue is both a VMware VCB issue AND a Windows issue. However, because VCB is the real app in question, overall, I would have to say that it is a VCB issue that VMware needs to deal with in one way or another. Because VCB does not support multiple SAN paths to get to the VMware ESX Server VMFS filesystems, this can cause issues that can result in failure of your VCB backups. To resolve this, you need to check your disk devices and disabled the backup/alternate paths to your VMware filesystems.