Let’s face it, building and testing Disaster Recovery (DR) plans isn’t anyone’s favorite activity. However, it’s like a life boat on a cruise ship — you hope you never need it but if anything happens it can be a lifesaver. Although your DR plan doesn’t help you much in day-to-day operations, the lack of one can be disastrous. This is especially true for small and medium-sized businesses that typically have fewer resources and are often tapped out just covering their daily operations. However, these are the businesses who probably need a DR plan the most because a prolonged failure in their IT infrastructure has the potential to put them out of business permanently. A solid DR plan is essential to businesses of all sizes, as it makes your company more resilient to serious IT outages and enables you to restore your critical services with as little impact on business operations as possible.
DR isn’t without its challenges. Business processes are complex and protecting them is equally complex. Just as important, DR can be expensive. For instance, if your DR plan requires a physical hot site, then you need to buy the computing, networking, storage, and software required to support your mission critical workloads. These types of costs have driven many businesses to look to the cloud to replace the need for a DR hot site.
So how do you go about building a DR plan? It’s not just making regular backups. Although there isn’t room to cover all the aspects of a DR plan in this post, let’s look at the high points that can help you get started building a new DR plan or revamping an existing one.
Identify Your Critical Processes and Their Dependencies
The first step to building an effective DR plan is to identify your organization’s needs as well as your critical IT components and their dependencies. What these are depends entirely on the type of business. For instance, an online retailer such as Amazon needs to make sure the web frontend and backend services that provide their online presence are all protected. The needs of a small manufacturing firm are quite different. There it’s typically more important to keep the local ERP system online. During this identification phase, it’s important to work with management to recognize the most serious vulnerabilities in the data center, as well as reviewing past outages and identifying their causes and resolutions.
Select the Appropriate Technologies Based on Your RTOs and RPOs
The next step is to determine your Recovery Time Objectives (RTOs) and your Recovery Point Objectives (RPOs). Your RTOS and RPOs are the primary drivers for the DR technologies that you choose to implement. To understand your RTOs, you need to ask how long can you go without this service? To understand your RPOs, you need to ask how much data can you stand to lose? While different businesses have different requirements, if your organization requires very short recovery times, then you could consider implementing a replication-based DR strategy where data from your primary site is replicated to the cloud or to another DR site. The data in the DR site is then as current as your replication interval.
Set Up DR Operational Procedures and Document Them
After determining your DR requirements, you need to create operational procedures designed to respond to your critical business outage. You need to identify your emergency response teams and assign them their DR contingency tasks as well as understanding your vendor’s emergency response requirements and capabilities. In addition, you need to document your DR procedures so that all personnel have a clear understanding of their roles and responsibilities.
Testing Your DR Plan
Finally, DR can’t be viewed as a set-it-and-forget-it activity. No DR plan is complete without regular testing. If you don’t test your DR plan, you don’t have a real DR plan. The last place you want to find out your DR plan has a problem is in the middle of a critical recovery. Operational procedures and requirements are in continual flux and regular periodic testing of your DR plan is a necessity if you want that plan to work when you need it to.
Tagged with Disaster Recovery