That’s the question I asked myself earlier today when I was down to my very last option for saving my own web server. In this article, I’ll tell you all about what went wrong with my Azure virtual machine and how Azure Backup for virtual machines saved my bacon.
- Related: Using Microsoft Azure Online Backup
My Personal Website
I’ve been running a blog since 2006. At first it was something that I used to generate attention for my new career as an IT contractor. Over time, the site evolved, and I’ve been stunned by how many people come to it every year to search for things on Windows, Hyper-V, and other subjects that I’ve littered the Internet with. My site has become an important notebook for me, but it’s also a source of income with several advertisers gracing me with their business. That means that the website is mission critical, and I cannot afford to lose it.
Backup, Backup, Backup
There’s a saying in IT: if you don’t have three copies of your backup, then you don’t have a backup. I’ve never trusted backup tools, and maybe that’s because I’ve been forced with too many big-name, bad products over the years, and I’ve become quite wary. So when I migrated my WordPress website from a local hosting company to Microsoft Azure, I implemented the following:
- MySQL backup. WordPress is powered by MySQL. Rather than use the limited free or pricey paid MySQL offerings from ClearDB, the people who power MySQL in Azure, I decided to run my own MySQL installation. A scheduled task runs MySQLDump every night to export all of the databases running in MySQL to a file on the file system of the virtual machine.
- Azure Backup agent. I installed an Azure Backup (MARS) agent into the guest OS of the Windows Server 2012 R2 virtual machine. A backup job runs every night to back up website files and the MySQLDump exports files to an Azure storage account… in a different Azure subscription. I told you that I was being cautious! The storage account in the other subscription is a locally redundant storage account, maintaining three copies in the Azure data center.
- Azure Backup for virtual machines. Microsoft recently launched a preview of a new service that allows you to backup Azure virtual machines as atomic units. Although this service is still in preview and backups can be slow, I felt that this new offering was a critical requirement for running stateful virtual machines in Azure and something that I might need one day. The storage account used as the backup target is of the LRS type, also maintaining three copies in the Azure data center.
I had three kinds of backup in operation and two of those was keeping three copies of the data, with one of those even being in a different storage account. I think I had covered all of the bases… or had I?
Do as I Say, Not as I Do
All software has security vulnerabilities. This is why we encourage people to apply updates to their machines on a frequent basis. This is easy to configure in Windows, so my web server was configured to apply Windows updates. However, I wasn’t doing this very regularly with WordPress, and I had never updated MySQL (5.5.x). And let’s not forget about the third-party plug-ins in the WordPress site. And did I mention that I was using the same, not-updated theme on my WordPress site since… oh I forget but it wasn’t in this decade! I was asking for trouble, right?
But at least I had installed the Microsoft Anti-Malware extension and Windows Update was deploying security fixes.
Crash and Burn
Yesterday was a bit of a crazy day, so I wasn’t paying any attention to email or social media. I came to work today, fired up TweetDeck as I normally do to start a catchup, and saw several mentions with people warning me about my site being down. That’s normally not a huge deal because I can usually catch it early, run IISRESET, and things are back to normal for a few more months. It was different this time.
I logged into the guest OS of the virtual machine and ran IISRESET. The IIS service and the application pool were started anew, and this normally resolves any issues. This time, however, the problem persisted. I rebooted the virtual machine, and the problem changed. Instead of a HTTP 404, I was getting a “Database connection timed out” error. This indicated to me that WordPress was running, but MySQL was not responding. Ouch! This is my worst fear with WordPress. Although I can muddle my way around SQL Server and have rescued databases like many of my fellow accidental DBAs, MySQL is a nasty piece of open source <insert adjective of choice here>.
I went to Services, and I found MySQL was stuck in a starting state. It was completely hung, and there was nothing I could do with it. I searched online for solutions, but found nothing relevant. That led me to a new idea by trying to upgrade MySQL to repair the service. I tried that, but the process got only so far until the upgrade wanted to stop MySQL, which I could not do because it was hung. I set the MySQL service to manual, rebooted, and retried the installer, but it required MySQL to be running to upgrade it.
I gave up on this install of MySQL, and I would have to resort to using one of my backups. I uninstalled MySQL 5.5.X and installed the latest version. I tried to import my “all databases” export, and that worked… but there was no sign of my WordPress database. I guess “all databases” means something different in penguin-speak.
I mucked around with that for a while. I even restored a backup of the export file from Azure Backup and imported that without success. I had reached the point where there was only one remaining option to save my nine years of blogging content. I would have to restore my Azure virtual machine from the Azure Backup preview service for virtual machines.
Azure Backup for Virtual Machines
I take a backup of the virtual machine every day and retain four weeks of recovery points. This gives me some scope to go quite a bit back in time if the need arises. I decided that using yesterday’s backup was useless because the failure happened sometime before the backup. I went, instead, with a recovery point from Friday.
The restore process for virtual machines is still quite a bit ropey. You:
- Cannot restore the virtual machine to the original place: I was not doing this. I created a new cloud service and powered down the original machine to keep it around, just in case.
- Have a choice of storage accounts: I was forced into using one storage account that I really didn’t want to use, even though I had created a new one just for the new cloud service.
I also created a new virtual network. I started the restore and waited … and waited … and … yes, I was stressed. Anyone that has ever restored a production machine knows exactly how I felt. This is when you find out if:
- Your faith in the backup tool was worth it.
- You have a job or not or, in my case, lose nine years of blog content and have to refund my advertisers.
The VM was eventually restored, and I panicked a bit when the Virtual Machines view was slow to refresh. The machine booted, and then I tested the website using the cloud service domain name… and I got a HTTP 404. Panic was rising before I checked the endpoints of the new virtual machine. HTTP was there with an internal port of 80 (good), but with a random external port of 50000-something (not good). I changed the external port to 80, retested, and I was greeted by “Database connection timed out.” I was just about to cry into my keyboard when it dawned on me: MySQL was replaying the transaction logs. I waited and tested, feeling my mouth get every drier and the deafening thump in my chest. Then the MySQL service refreshed from “Starting” to “Running.” I refreshed the browser … it was so slow … and the familiar very old WordPress theme appeared on my screen.
I use a CNAME for my website that points at the Microsoft-managed domain name for the cloud service. This CNAME also has a TTL of 300 (5 minutes). Once I was sure that I was back online, I modified the CNAME record with my registrar and waited. 5 minutes later, www.aidanfinn.com was back online.
Updates All Around
I noticed the Anti-Malware extension was eating up a bit of CPU, so I opened the admin console and checked out what was going on. A full scan was starting. I checked the history, and I was shocked. Lots of .PHP malware had been found and quarantined over the previous couple of months. I was happy that the malware was caught before it could do anything, but I was upset that it was there in the first place. My machine has a minimal TCP footprint on the Internet, but I knew roughly where the culprit was:
- WordPress was quite out of date
- Many of my plugins were old
- The theme was ancient
I updated the definition file in Microsoft Anti-Malware and ran a full scan. I then:
- Updated WordPress. This prompted me to finally configure FTP for easy remote updates.
- Cleaned up plugins. I deactivated and deleted unwanted plugins and with a new WordPress version, I could upgrade the remaining ones.
- Replaced the theme. It took me a while to find something half-decent, but I replaced my site’s theme.
As I write this, the full scan is still running. When that’s complete I plan to upgrade MySQL and run an online scan against the guest OS of the virtual machine just to be sure.
Maybe Three Backups Isn’t Enough
This was a case of too close, so I’ve added another backup option. A WordPress plugin has been installed to perform a daily backup of the site and the database, just in case things go wrong again.
A few lessons were re-learned today:
- There is no such thing as too many backups
- No matter how busy you think you are, there’s always enough time to update mission-critical software
- Do as I say, and I should do what I tell you to do, too
And finally, Azure Backup of virtual machines really does work, even if the preview release is a little ropey and slow to backup virtual machines.