Working with Domain Member Virtual Machines and Snapshots

One of the benefits of using a virtualization product that allows you to create snapshots, is the ability to create a “point in time” to which you can always revert your virtual machines. By reverting to this snapshot, you get your VM to the state in which it was saved, and are able to perform various tasks such as testing software, doing QA, creating labs and so on.

However, one of the nasty issues of working with snapshots is when you have one or more virtual machines that are members of an Active Directory domain. When you create snapshots of such machines and restore them, you might occasionally find that all authentication involving the VM seem to fail, and face an issue of not being able to log on to the virtual machines, or not being able to access files and shares across the network. You might even get errors like this one:

Windows cannot connect to the domain, either because the domain controller is down or otherwise unavailable, or because your computer account was not found. Please try again later. If this message continues to appear, contact your system administrator for assistance.

secure channel error 1
If you log on locally (not using a  domain account) to the computer (in this example it’s a Windows XP Pro client), you’ll see the following events in the Event Viewer.

NETLOGON 3210
This computer could not authenticate with \\WIN2003-SRV1.petrilabs.local, a Windows domain controller for domain PETRILABS, and therefore this computer might deny logon requests. This inability to authenticate might be caused by another computer on the same network using the same name or the password for this computer account is not recognized. If this message appears again, contact your system administrator.

secure channel error 2

LSASRV 40961
The Security System could not establish a secured connection with the server cifs/WIN2003-SRV1.petrilabs.local.  No authentication protocol was available.

secure channel error 3

W32Time 18
The time provider NtpClient failed to establish a trust relationship between this computer and the petrilabs.local domain in order to securely synchronize time. NtpClient will try again in 15 minutes. The error was: The trust relationship between this workstation and the primary domain failed. (0x800706FD)

secure channel error 4
And possibly others.
This is nasty, however, if you carefully remember the days when Ghost was the only way to image a computer, you might also remember that it was always a good practice not to “ghost” a machine that was a member of a domain, and that if you didn’t do that, you ended up with a cloned computer that was “ghosted” back from an image, and that, in some cases, could not log on to the domain it was a member of. So this is not a new situation, it’s just the new “ghosting” tools we’re using.
The reason for this is that there is a computer account password mismatch. The Windows-based domain member VM thinks that its machine account password is something X, while the domain controller believes it to be something Y. Because of this, the VM cannot authenticate itself to the domain controller(s).
So how does this work? Just like user account passwords, computer account password is a “secret” that is set up by the computer account, and that is used when a Windows-based domain member computer authenticates itself to the domain controller and establishes a secure channel.
When the computer is started, a service called NetLogon uses the machine account password and tries to establish a secure session with the domain controller. The usual CTRL+ALT+DEL Winlogon process also relies on this authenticated secure channel to send user credentials to the domain controller for verification and log them into the computer. Other services running on this machine that work with the LocalSystem or NetworkService credentials also require this authenticated secure channel to get access to domain resources.
So without this proper password there cannot be a secure channel, and hence the issues described above, and various things fail as a consequence.
The password is first created when the computer is joined to a domain. It is shared by domain controller and the computer.
So what happens during regular operations? Well, to explain this, we need to think or 3 scenarios:
1. Regular operation, client computer works “regularly”, never offline for extended periods of time. Each Windows-based computer maintains a machine account password history containing the current and previous passwords used for the account. Regularly, the computer account password change is initiated by the Netlogon service on the client computer every 30 days by default . Since Windows 2000, all versions of Windows have the same value. After this change, both the domain controller and the computer use the new password for authentication.
When a client determines that the machine account password needs to be changed, it would try to contact a domain controller for the domain of which it is a member of to change the password on the domain controller. If this operation succeeds then it would update machine account password locally.
When two computers attempt to authenticate with each other and a change to the current password is not yet received, Windows then relies on the previous password. If the sequence of password changes exceeds two changes, the computers involved may be unable to communicate, and you may receive error messages.
2. Not-so-regular-operation but still possible, when a client computer is taken offline for an extended period of time, 30, 60, 90 days or more. In this scenario, if a computer is turned off for three months nothing expires. When the computer starts up, it will notice that its password is older than 30 days and the Netlogon service on the client computer will initiate action to change it. This is only applicable if the machine is turned off for such a long time.
3. Snapshots, when either a “Ghost”-type image or (related to this article) a VM snapshot is taken, then the computer resumes regular operation (as of scenario #1). Then suddenly, after working for 30, 60, 90 days or more, the snapshot is brought back. While using snapshots, when the domain member is restored to an older snapshot, it loses track of any password change changes done later and tries to use an older password. Hence it fails to authenticate itself.
So how do you fix this? Well, first of all, if you’ve already gotten to the point where the error occurred and you cannot log-in, you will need to read my Fixing “Windows cannot connect to the domain, either because the domain controller is down or otherwise unavailable, or because your computer account was not found” Errors article for a solution.
However, if you wish to prevent this from happening AND you’re using virtualization software and snapshots, you may want to do one of the following:

Option #1

Increase the computer account password age, or disable password changes altogether. Both these can reduce likelihood of the problem, but may reduce the level of security in the domain. On the other side, since this is probably a test, a QA or a demo environment, you may consider it as a valid option . These settings are available on the domain member (and not on the domain controller), and as such, you can change them on your computer before you create a snapshot out of it.

Warning!

This document contains instructions for editing the registry. If you make any error while editing the registry, you can potentially cause Windows to fail or be unable to boot, requiring you to reinstall Windows. Edit the registry at your own risk. Always back up the registry before making any changes. If you do not feel comfortable editing the registry, do not attempt these instructions. Instead, seek the help of a trained computer specialist.
As noted above, these settings are configured on the domain member, and are controlled by the Netlogon service. Settings are found in the following Registry key:
HKLM\SYSTEM\CurrentControlSet\Services\NetLogon\Parameters
DisablePasswordChange (default off) prevents the client computer from changing its computer account password. To disable, give it a value of 1.
MaximumPasswordAge (default 30 days) determines when the computer password needs to be changed. Change it to whatever number of days you think may be enough. For example, if you use snapshots that are less than 100 days old, then you can set this value to 100 or similar.


Settings can also be configured by using Group Policy (either domain-based GPO or local):
Computer Configuration\windows Settings\Security settings\Local Policies\Security Options
Domain member: Disable machine account Password changes
Domain member: Maximum machine account Password age
secure channel error 5
After making the changes, reboot the client computer(s), and then create a snapshot, if you need one.

Option #2

Live with it, know it’s an issue, fix it every time. It’s time consuming, sure, but it’s probably more secure than option #1. Read my Fixing “Windows cannot connect to the domain” Errors article.

Option #3

If these VMs are used for testing, QA, demos etc. you could consider creating a “closed” environment, where not only the client computer has a snapshot, but also the domain controller(s) have one. When you revert to a snapshot, you also revert to the same snapshot level on the DCs, all of them, at the same time. For some settings this may actually be a  nice setup. However, if you cannot create such a setup then you’re probably have to either go with option #1 or #2.