On September 20, 2016, Microsoft released Exchange 2016 CU3 as part of its quarterly release cycle of cumulative updates. Apart from the “read from passive” feature (very useful in large DAGs), Microsoft didn’t deliver much new functionality in Exchange 2016 CU3. However, the necessary engineering was done to make CU3 deployable on Windows 2016 servers. The normal caveats were given that customers should test the new cumulative update before deploying into production. Six weeks later, that advice is proven to be both sage and accurate (again).
A problem has been discovered in Windows 2016 that surfaces as crashes of the IIS host process (W3WP.EXE). The issue has been reported in several online forums (an example here). Windows engineering has updated their release notes to say:
“If you attempt to run Microsoft Exchange 2016 CU3 on Windows Server 2016, you will experience errors in the IIS host process W3WP.exe. There is no workaround at this time. You should postpone deployment of Exchange 2016 CU3 on Windows Server 2016 until a supported fix is available.”
Based on all the reports that I have seen, the problem does not appear on standalone servers. It does once you deploy Exchange 2016 CU3 on Windows 2016 and form a database availability group (DAG). Remember, all member nodes in a DAG must run the same version of the operating system and as close as possible to the same version of Exchange.
The problem is seen as repeated W3WP crashes reported in 4999, 5011, and 1003 events in the application event log. It’s important to understand that the issue is not in IIS. In fact, it lurks much deeper in the bowels of Windows. The IIS problems seen in early customer deployments are just symptoms of an underlying problem, much like a hot temperature might indicate that flu is on the way. Because the problem only shows up on a DAG, the bug is likely to be related to Windows cluster or some other low-level component.
On a practical level, because IIS is affected, many parts of Exchange experience problems – the administration console, OWA, any HTTP-based protocols (like ActiveSync), remote PowerShell, and so on. In short, it’s a horrible bug to surface.
At this point, all that you can do is wait for Microsoft to fix the problem in Windows 2016 before you can consider deploying Exchange 2016 on that platform. Or decide that Windows 2012 R2 is the better platform for now.
There’s no word when the fix might be available, but it’s obviously in Microsoft’s best interest to push something (that works) out ASAP.
The inevitable question raised by such a problem is how it managed to get through Microsoft’s test process for Exchange 2016 CU3. After all, you’d assume that the testers actually attempted to exercise basic Exchange functionality on Windows 2016 and weren’t satisfied when CU3 installed successfully on a Windows 2016 server. Perhaps the bug sneaked through because no one ever tested Exchange 2016 CU3 on a Windows 2016 DAG?
Microsoft has done a good job to improve the quality of Exchange cumulative updates since the bad old days of the first updates released for Exchange 2013. Overlooking such a fundamental bug is a setback. Let’s hope it’s not the start of a new trend.
Update: The Exchange engineering group has posted a little more information on their blog.
Follow Tony on Twitter @12Knocksinna.
Want to know more about how to manage Office 365? Find what you need to know in “Office 365 for IT Pros”, the most comprehensive eBook covering all aspects of Office 365. Available in PDF and EPUB formats (suitable for iBooks) or for Amazon Kindle.