Announcement

Collapse
No announcement yet.

Should a vMotion occur in this scenario?

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Should a vMotion occur in this scenario?

    I'm running vSphere 5 in an HA cluster across two hosts (vsphereA and vsphereB). I have the HA cluster configured for host monitoring and datastore heartbeat monitoring with admission control disabled (hopefully I rightfully understand that datastore heartbeat monitoring prevents inadvertent and unwanted HA failovers due to management network isolation). Each host has a single connection to a dedicated iSCSI network and iSCSI target (no MPIO). All vmdk's for all VM's exist on the iSCSI datastore. As a test of HA I disconnected the iSCSI connection on vsphereB and was surprised to see that the running VM's on vsphereB continued to run on vsphereB. The powered off VM's were showing as inaccessible (which I expected due to the fact that they weren't running and the connection from vsphereB to the iSCSI target was severed) but the running VM's continued to run and continued to be "owned" by vsphereB. I expected to see an HA failover occur for those VM's and expected to see them "owned" by vsphereA after the HA failover (which didn't occur). I'm at a loss to understand why an HA failover didn't occur for those VM's. Am I misunderstanding in which cases an HA failover should occur?
    Last edited by joeqwerty; 29th September 2012, 16:32.

  • #2
    Re: Should a vMotion occur in this scenario?

    Well you are definitely misunderstanding what vMotion is - it has no relationship whatsoever with HA. The events that are triggered when HA "kicks in" are failovers. Live migration of a VM is a vMotion and occurs manually as an administrative action or automatically if you use DRS.

    Regardless of datastore heartbeats I would implement redundant management network connections. I would also feel very uncomfortable having only a single network connection to my iSCSI storage from each host. I would also leave admission control enabled as it is a guarantee of failover resources being available - you can only power on a VM if it can be protected by HA in terms of the minimum resources necessary (reservations and overhead).

    As to why your VMs didn't failover, the hosts were still able send one another heartbeats over the management network, your master host still saw your slave host as being alive regardless of the lost storage connection.
    VCP2 / VCP3 / VCP4 / VCP 5 / VCAP-DCA4 / VCI / vExpert 2010-2012

    Comment


    • #3
      Re: Should a vMotion occur in this scenario?

      Originally posted by scott28tt View Post
      Well you are definitely misunderstanding what vMotion is - it has no relationship whatsoever with HA. The events that are triggered when HA "kicks in" are failovers. Live migration of a VM is a vMotion and occurs manually as an administrative action or automatically if you use DRS.

      Regardless of datastore heartbeats I would implement redundant management network connections. I would also feel very uncomfortable having only a single network connection to my iSCSI storage from each host. I would also leave admission control enabled as it is a guarantee of failover resources being available - you can only power on a VM if it can be protected by HA in terms of the minimum resources necessary (reservations and overhead).

      As to why your VMs didn't failover, the hosts were still able send one another heartbeats over the management network, your master host still saw your slave host as being alive regardless of the lost storage connection.
      Thanks for the clarification. My understanding was that vMotion was the "engine" that facilitates the migration of a VM from one host to another during an HA failover as well as during a DRS operation. Now I get it that an HA failover doesn't migrate the VM's from one host to another but instead powers on the VM's on another cluster host in the event of a host failure. So it seems to me that you need to be very careful about selecting what to do with the VM's in the event of host isolation (leave the VM's running on the original host or power them off).

      Comment


      • #4
        Re: Should a vMotion occur in this scenario?

        Glad to help, but rather than worry about the isolation response the best thing to do is mitigate the risk of isolation by using multiple network connections.

        A well-designed vSphere/HA implementation should have multiple connections from host to storage, risk of host isolation mitigated, and admission control left enabled with the appropriate policy.
        Last edited by scott28tt; 29th September 2012, 18:00.
        VCP2 / VCP3 / VCP4 / VCP 5 / VCAP-DCA4 / VCI / vExpert 2010-2012

        Comment


        • #5
          Re: Should a vMotion occur in this scenario?

          Thanks much, Scott. This is my home lab so I'm working my way through various infrastructure designs and the advantages/disadvantages of each as well as learning how each one behaves. I've now got a better understanding of vMotion and HA.

          Comment


          • #6
            Re: Should a vMotion occur in this scenario?

            One final comment - datastore heartbeats are used to help distinguish between a failed host and an isolated host when management network heartbeats stop working.

            Sounds like you're having fun learning as you go
            VCP2 / VCP3 / VCP4 / VCP 5 / VCAP-DCA4 / VCI / vExpert 2010-2012

            Comment


            • #7
              Re: Should a vMotion occur in this scenario?

              Yep, I've got a grasp on datastore heartbeats.

              I am enjoying the learning.

              Comment


              • #8
                Re: Should an HA failover occur in this scenario?

                Originally posted by joeqwerty View Post
                ...As a test of HA I disconnected the iSCSI connection on vsphereB and was surprised to see that the running VM's on vsphereB continued to run on vsphereB...
                This is known as an APD (All-Paths-Down) state. You yanked the iSCSI and the ESXi host is not able to determine if it's the device lost is permanent or transient and stays there for ever...

                I suggest you read this VMware KB for more info bout this topic:
                http://kb.vmware.com/selfservice/mic...rnalId=2004684

                As Scott28tt said, make sure you have redundancy at every single layer of your design to mitigate the risks.

                Freelance, Business Owner, Virtualisation & Cloud Computing Fan Boy, VCP4/5/Cloud, VCAP4/5-DCA/DCD, vEXPERT '10-'14, EMCCA-Specialist, VMware Alumni.

                Comment


                • #9
                  Re: Should a vMotion occur in this scenario?

                  Thanks much, Piro.

                  Comment

                  Working...
                  X