The cloud is the magical, be-all, end-all, solution to every IT problem that has ever existed or will exist if you believe the marketing. Amazon, Microsoft, Google and everyone else who makes up the cloud segment make big promises about their infrastructure but they still face the same challenges as on-premises environments, you are only as strong as your weakest link.
Last week, Amazon had a major outage with AWS that impacted thousands of companies and likely millions of end-users when an employee accidently entered the wrong command to attempt to fix a billing issue. The input took down many servers on accident which created a ripple effect that rendered much of AWS useless; you can read the triage report here.
While this outage was significant and a serious problem for Amazon, it can serve as a warning signal for those who are using cloud services for all of their compute, storage, and other IT tasks. By only going with Azure or AWS or Google, you are held hostage to their infrastructure, and even though each service typically offers better up-time than on-premises environments, they are not immune to outages.
During the AWS downtime, the status dashboard was running on its own infrastructure which then limited Amazon from being able to update that site to let its users know of the disruption.
As with on-premises data centers, we all know that redundancy is key to maintaining a high level of service and that same philosophy should be applied to cloud usage as well. Yes, all the major players have geo-independent data center, but if the connecting fabric of these services fails, it doesn’t matter how diverse they are, you will experience downtime.
When possible, diversifying your cloud service across two vendors is an ideal solution. While not the cheapest option, if you have your primary service on AWS and a backup solution on Azure, last week, your users would not have experienced an outage. In the new world of cloud-based IT, two clouds are better than one.
Outages are inevitable; on-premises and cloud downtime will always occur, there is no way to avoid it. What separates the best in class companies from the rest are how you react, recover, and prevent the downtime from impacting your operations.