I’m tired of bugs in software shutting us down. I am convinced that Microsoft needs an “XP SP2” fix to everything that they are doing today. I explain why in this post.
I have the blow open in one of my monitors as I am writing this post:
MFA had a 14-hour outage that affected worldwide customers that were attempting to securely sign into Microsoft (and potentially 3000+ other) cloud services using their Azure AD licensing. Imagine not being able to do any work for 14 hours! How much money did that cause people? A week later – yesterday – Microsoft released a root cause analysis (RCA) to explain how an update combined with two other issues caused the outage.
I’m all good about these “warts and all” reports – transparency inspires confidence that things will improve and that professionals are dealing with issues. But when the service fails 1 day later – that’s not inspiring at all.
Microsoft is all-in on “fail fast” – get new code out there as quickly as possible. Quality – Nah, that’s secondary. Testers – who needs them because every dev can review their own code.
I studied computer science in college – I have my BSc, after learning to code in ancient languages like Cobol, as well as C and C++. My education included lots of commercial stuff, software engineering, and project management. I learned a lot in those years – and the biggest lesson in all my practical and project work was that the worst person to ever review code was the person that wrote it. I spent many hours looking for errors that wouldn’t compile or logic bugs that made no sense. I’d ask a friend to look, they’d take 10 seconds, and then point out the obvious (to everyone else) problem.
This is why I was worried about Satya Nadella’s rise to CEO. Back in 2014, I wrote:
Nadella was the leader of a division where quality has slipped.
I was concerned that something that happened in his division would happen across all of Microsoft. I saw software quality slip badly leading up to the 2012 releases of software in his group – particularly in System Center. Bug reports went ignored and bad issues made it into RTM releases and well beyond. The when software testers became irrelevant – that wasn’t good.
Free labor is the way, apparently. Microsoft made a big push to get the community to test their software. Windows Insiders, Windows Server Insiders, Office Insiders … free testers the lot! Microsoft pushed out their frequent builds to the millions of volunteers and asks for bug reports and feedback. Windows 10 has been the perfect example of how this has worked. I couldn’t use the first two releases of Windows 10 on my Ultrabooks because the Intel HD drivers were always being replaced with incorrect drivers by Windows Update – the usual block mechanism would last a month. I reported it – I produced debug logs, but nothing worked.
The anniversary update was a mess of “exploding Kindles” (to paraphrase Paul Thurrott) and misbehaving webcams. Apparently, the issues were reported during the Insiders pre-release process. 1803 had lots of issues according to the anecdotal reports of Mary Jo Foley and Paul Thurrott. Everyone who checked their Windows Updates got the build – that’s what you get for “seeking”. 1809 has been a travesty, with the build being pulled for a month, and then released with a bug that broke network drive mappings. Once again – the issues had been previously reported by Insiders.
Fail fast might be a great idea for devs to get features out fast, but there’s a problem when the failures are frequent.
Back in the Windows XP days, Microsoft had one security issue after another. There was barely a month that went by without some wide-open vulnerability being found in SQL Server, Windows, Internet Explorer, and so on. Eventually, the leadership of Microsoft decided to stop everything. A major code review was done and resulted in what some call Windows XP R2 but was really Windows XP Service Pack 2. This service pack changed the security model of Windows so much that it really was like a whole new version of Windows.
This wasn’t the first of these resets, and it certainly wasn’t the last. Before this, Bill Gates ordered a corporate reset to shift from a “PC in every isolated home” strategy to an Internet-connected one. Steve Ballmer ordered a shift from client/server computing to modernize Microsoft to a cloud-first service provider – effectively creating the mold for the Microsoft that is sailing high in the stock markets today.
I believe that it is time for another of these resets. Microsoft needs to re-focus on two things:
- Finishing the job
It’s not a secret anymore that Microsoft develops software according to a 6-month schedule, named after atomic elements. I would like Microsoft to halt one of those windows and focus entirely on code quality. When bugs are fixed, then features should be finished. All too often an engineer can create something flashy and move on to the next thing without finishing the job – you cannot upset these rock stars! This needs to stop too.
Then the sprints can continue – but with a change. Sprint A should be about creating new features, followed by Sprint B to stabilize and finish the job. In the world of semi-annual channel releases, such as Windows 10, it should be sprint B that has the longer support life because that’s what businesses would choose to deploy.
I love new features – I have a lot of articles to write every month and I need content! But, seriously, I enjoy learning about and figuring out new whizz bangs as much as any nerd. But the constant outages, bugs, and bad quality are impacting businesses. Poor quality will impact consumption, purchases, and eventually share valuations.
What do you think? Do you agree or disagree with me? How would you improve things? Share what you think down below.