No announcement yet.

Random VM reboots with Veeam B&R

  • Filter
  • Time
  • Show
Clear All
new posts

  • Random VM reboots with Veeam B&R

    I’m wondering if anyone can help us with the issue below.
    We are currently running around 12 HA VMs, on a 2-node Windows Server 2012 R2 Hyper-V cluster. VM storage is housed on SMB 3.0 shares, which are running from a 2-node Windows Server 2012 Scale-out File Server cluster.
    2 SMB 3.0 shares have been provisioned from the 2 CSVs presented by the SOFS cluster. Each SOFS cluster node is an owner of a CSV. Storage hardware is a Lenovo ThinkServer JBOD.
    For VM backup, we are utilising Veeam Backup and Replication 8, with update 2b applied. The backup run begins at 10pm each evening, by means of a scheduled PowerShell script.
    The backup completes successfully, but have found over the past week or so that upon arriving at the office the next day, there are multiple 1069 cluster events for the VMs which have rebooted at random.
    The VMs in question, are random in terms of which ones reboot each evening.
    In an effort to find the root cause of the problem, we disabled all Veeam VM backups one evening. The following morning, our Hyper-V cluster reported that it had gone the entire period without having any issues.
    We then, manually ran the backup script during office hours, and waited for any issues. The backup ran without issue, until it came to one of the last few VMs.
    What then happened, was that 3 of the VMs restarted. Specifically, EMAIL, PRINTSRV & TS4. The VMs rebooted during the TS4 backup.
    These restarted between 12.47pm and 12.48pm. All 3 came back up. There doesn’t appear to be any link between the three (apart from the fact that all 3 were running on the same HV node). What’s more odd is that the reboots occurred way after 2 of the VMs.
    I should add that there were other VMs running on that same HV node.
    Backup completion times:
    EMAIL – 10.49am
    PRINTSRV – 12.08am
    TS4 – 12.55am
    The backup then proceeded, until reaching the penultimate VM. Suddenly, I then noticed that all VMs on all HV nodes lost connection to their storage, and were either turning off or starting up on another node of the HV cluster. A few seconds after seeing this, I checked the logs for our SOFS cluster, and noticed that RHS had stopped unexpectedly, which caused the file cluster to restart and VMs to bomb out.

    Amazingly the Veeam backup proceeded to backup the last VM when both clusters returned to a normal running state.
    Does anyone have any ideas what is causing this problem? I keep reading about disabling ODX in Server 2012, for storage hardware that doesn’t support it.

    I've been pointed towards the update below, following comments from someone else who experienced similar issues. It then led me to here: -
    Is there anything i could check in vssadmin to see if the issues I'm, experiencing are due to VSS?

    All I know is that running the backup, causes problems.
    Many thanks.
    Last edited by MrJH; 19th August 2015, 10:09. Reason: addition

  • #2
    Have you also raised this with Veeam support as it appears to be their product causing the problems
    Anything in event logs on both the guests and hosts?
    (Sorry, cannot provide any other help at this stage but will look further if I can)
    Tom Jones
    MCT, MCSE (2000:Security & 2003), MCSA:Security & Messaging, MCDBA, MCDST, MCITP(EA, EMA, SA, EDA, ES, CS), MCTS, MCP, Sec+
    IT Trainer / Consultant
    Ossian Ltd

    ** Remember to give credit where credit is due and leave reputation points where appropriate **


    • #3
      Originally posted by Ossian View Post
      Have you also raised this with Veeam support as it appears to be their product causing the problems
      Anything in event logs on both the guests and hosts?
      (Sorry, cannot provide any other help at this stage but will look further if I can)
      Hi Ossian,

      Yes the issue has been raised with Veeam. points to a list of recommended updates for Hyper-V servers. Wondering whether to apply these, as the backup is initiated from one of our SOFS nodes, where the Veeam software is installed. Do you think I should install the updates to the HV nodes regardless? Don't want to give myself a bigger problem through installing updates here there and everywhere!


      • #4
        I would say installing those patches is probably fine. It really couldn't get any worse and you can always uninstall those if it does.

        I would download and install any updates through Microsoft Update and then go through these two links and install any updates that apply (I think there's one in there that addresses Veeam):

        Network Consultant/Engineer
        Baltimore - Washington area and beyond


        • #5
          I agree with other points raised and would add the following based on some experience we have had in the past, which ended up being the cause of random reboots
          • The Live Migration network was being saturated, which meant that the RHS Service reported failed heartbeats talking to specific VMs which resulted them restarting; the RHS Service (which you possibly are aware of already) determines whether it needs to failover the VM
          • For the above, saturation can be detected when you have multiple paused CSV volumes, which does not necessarily always cause an issue but intermittently, it can do, so causing server restarts (as storage disappears temporarily) with some coming up too quick, so getting a boot error until the VM is restarted
          • As you already know, backups has caused this
          • As you are doing so, Microsoft recommended approach is often to ensure drivers are up-to-date (ensuring they are know supported drivers and version) and hotfixes, especially those related to Hyper-V, Cluster Services and storage


          • #6
            MrJH - were you ever able to get to a resolution for the issue with Veeam causing multiple VMs on your cluster to reboot?

            We've worked with the same situation with the same hardware/software configuration and have had tickets open with Microsoft, Veeam, and the hardware OEM since January and the problem still persists for us.

            Thanks in advance,