How to build a disaster-recovery survival kit for SQL Server

Home Forums General Chat MJF Chat How to build a disaster-recovery survival kit for SQL Server

This topic contains 3 replies, has 3 voices, and was last updated by Mary Jo Foley Mary Jo Foley 3 days, 10 hours ago.

Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
    Posts
  • Mary Jo Foley
    Mary Jo Foley
    Moderator
    #620817

    Our next MJFChat, scheduled for Tuesday, September 3, is between me and Dave Bermingham, Technical Evangelist with SIOS Technology and a Microsoft Cloud and Datacenter Most Valuable Professional (MVP). The general topic of our chat is how to build a disaster-recovery survival kit for SQL Server.

    What questions do you have for Dave about how your organization can and should try to head off downtime caused by IT failures, especially when it comes to your SQL Server database? Any issues you’ve hit in trying to do this inside your own organization? No question is too big or too trivial. I’ll be chatting with Dave on September 3, and will ask some of your best questions directly to him. Just add your questions below and maybe you’ll be mentioned during our next audio chat.

    Brad Sams
    Brad Sams
    Keymaster
    #623497

    Mary Jo Foley:                 00:00                    Hi, you’re listening to Petri.com’s MJFChat show. I am Mary Jo Foley, aka year Petri.com community magnet. I’m here to interview tech industry experts about various topics that you, our readers and listeners want to know about. Today’s MJF chat is going to be all about how to build a disaster recovery survival kit for SQL server. My guest today is Dave Bermingham Technical Evangelists with sales technology and a Microsoft cloud data center, most valuable professional. Wow, that’s a mouthful. Thanks for joining us Dave.

    Dave Bermingham:         00:43                    Alright, thanks for having me Mary GJ.

    Mary Jo Foley:                 00:46                    So when you suggested this chat topic, you emailed me a good opening statement where you said failure per se is not really the problem. Downtime is the problem, but there are ways to prevent failures. So I wanted to kind of delve into that on this chat because you also mentioned that there are different things that IT Pros should be thinking about in terms of their needs and their budgets. So where do you like kickoff with people when they, when you say, I have an idea for you how to build a disaster recovery survival kit.

    Dave Bermingham:         01:22                    Yeah. So you generally start with what are your requirements in terms of recovery time objective, recovery point objective, and then we also look at the environment. Is this a on premise solution, physical servers, virtual machines or this a hybrid cloud situation or is that a pure cloud situation? And so once we kind of get the lay of the landscape, we review some of the standard options for high availability and disaster recovery to see where are they currently stand and where their liabilities might fall. And on premise, when you’re talking about disaster recovery, then you’re talking about, you know, where is my DR site, that’s something that I’m going to build another data center and manage that. Or am I going to leverage, maybe the cloud and build some hybrid cloud solution and have the cloud as the DR. Well, more and more these days I’m talking with customers that have bought a cloud into the cloud.

    Dave Bermingham:         02:34                    They’re all in and they’re there deploying, you know, either migrating their existing infrastructure to the cloud or they’re building new applications in the cloud. The cloud is a new paradigm to them. They, you know, the traditional HADR scenarios while still relevant they completely have a wrinkle. So where normally you might be building failover cluster instances and or you know, you’re doing some sort of, yeah, maybe you’re doing a BMR HA or whatever it might be on premise in the cloud, a lot of those traditional solutions are not available. So they’re really kind of reaching out and starting to understand what options they have. And you know, once they move all into the cloud.

    Mary Jo Foley:                 03:28                    I mean, this is kind of an open-ended and opening a question on my part, but is it actually saving them money if they think more about doing this in the cloud versus on premises?

    Dave Bermingham:         03:39                    Well, absolutely. The cloud infrastructure is so much more advanced than the vast majority of your typical customers that, you know, being able to deploy data centers. If you think about, you know, we look at Azure or AWS, well they have data centers across the world, so they have regional data centers, but then even within each region they are going to have multiple what they call availability zones.

    So in the east region you might have three different data centers that are on different flood plains and power grids to have that type of resiliency where normally, you know, the typical customer is not going to be able to have that access to that type of infrastructure. I mean even locally within the same data center, the resiliency of the physical server that’s hosting your virtual machine or your cloud image, the resiliency built into that is that the best, you can get.

    There’s triple redundancy at the storage layer. So just right out the gate by deploying even a single instance in the cloud, you’re going to have a really pretty robustsolution that’s going to give you typical SLA for a single instance is three nines of availability, which is not, not bad, but still in the most business critical solution, you’re gonna want to get to that four nines of availability, which is typically the entry level for what people will call highly available.

    Mary Jo Foley:                 05:25                    Right. And what about if you’re using, since we’re talking about to SQL server here, what if you’re using SQL server running on windows server versus Linux versus a hybrid configuration of the two, are there different considerations that people have to take into account when they’re building for that?

    Dave Bermingham:         05:45                    You know SQL server itself has two high availability and disaster recovery options. The first being SQL server cluster instance is, which has been around for quite some time since SQL version seven. And that model traditionally requires sort of shared storage device, which becomes problematic once you move to the cloud because they’re the major cloud providers don’t have a cluster where shared storage solution that lets you build failover cluster instances.

    So in those situations you’re going to have to use party options to enable that. So some sort of data replication solution that integrates with clustering and those solutions are available in the Azure marketplace, AWS marketplace. But then the other option would beSQL server always on availability groups. So the availability groups introduced in SQL 2012, I think that kind of a iteration or the next generation of database mirroring what it gives you additional advantages being able to group multiple databases in the same availability groups so you can scale them over together. It’s much more robust in terms of leveraging the failover clustering, core models. So it’s a more robust solution than database mirroring and that is certainly available if you’re deploying in the cloud. But one of the problems with that is for the full version, it requires the SQL server enterprise edition, which, you know, it can be an expensive proposition if you’re coming from on Prem and using SQL server standard so you have to kind of look at, look at the cost associated with that and weigh your options.

    Mary Jo Foley:                 07:48                    That’s good. Speaking of options, you know, we’ve talked quite a bit about Azure as being the destination for some of this high availability and disaster recovery solution, but there are also, as you mentioned, solutions available for AWS and Google. So when you’re talking to your customers, how do you go about comparing what’s out there? I mean, it’s not the case I would assume that just because you’re using SQL server means automatically you should think you have to use Azure.

    Dave Bermingham:         08:22                    Yeah, correct. There is, there’s a lot of customer that are on all three platforms and we talk to customers every day onleveraging these different platforms. All three of them have some similarities , and that being the ability to leverage multiple data centers for availability. In Google you have zones. In AWS you have, availability zones and regions and same with Azure availability zones and regions and all the cloud providers have similar service level agreements in that they will guarantee, you know, for single instance in a single region if they give you 99.9% availability of that single server, but if there’s a disaster and that reason’s offline, you’re offline, you don’t have any sort of resiliency or redundancy plan, you’re just, you’re just going to, you know, you get a little refund cause your SLA was exceeded.

    Dave Bermingham:         09:24                    But that’s probably not very important in the grand scheme of things. So you need to plan for some sort of redundancy resiliency that leverages the different availability zones. So for high availability, leveraging availability zones, which again, they are other data centers within the same region that enables you to do things like with synchronous replication. So whether you’re doing, the availability groups or synchronous replication or third party replication with synchronous and automatic fail over enabled, that gives you that high availability to recover from, you know, more typical type of failure scenarios. But also not only that, but for planned maintenance, just like any other data centers, at some point in time there’s going to be a need to reboot your server for some sort of update. Although you get notices and be notified when it’s going to happen, if you don’t have a very simple ability to move that workload to a standby server on a different, in the different region or different availability zone, then you’re gonna have to schedule that downtime.

    Dave Bermingham:         10:35                    And many people want to minimize that downtime as much as possible. So they all have a similar, you know, infrastructure zones, regions, I’d say. Azure goes look the beyond the typical, you know, AWS in a vague, you’d have some other options that are not quite available yet in one of them would be like Azure site recovery, which is you know, disaster recovery is a service that you can get within the Azure Data Center, which will actually replicate your entire instance from one region to another region of port disaster recovery. And in many scenarios that’s going to work just fine. There are some limitations with that in terms of the rate of change, can’t exceed 10 megabytes per second per disc. So you have to weigh it and see if it’s a good fit for you. But you know, they have options like that.

    Dave Bermingham:         11:33                    And they also have Azure a lot more storage options. So even the storage that you’re running your instances on, can be zone redundant or Geo redundant or GLN zone redundant. So they have a lot of options that will get your data at the storage layer off site into different locations. That’s the most important thing. Any disaster is data, obviously you need to be able to recovery so that the recovery point objective how much data have you lost and you’ve got another disaster and then recover time objective. How, you know, how much, how long will it take to get that instance back up and running. So there’s lots of options in Azure for replication of that data.

    Mary Jo Foley:                 12:27                    Do you, I don’t know if you can say this, I would think this is public information if it’s out there, but I know SQL server 2019 is close to release. Is there, is there gonna be anything new in there that would be of interest to people who are thinking about disaster recovery and high availability for SQL server?

    Dave Bermingham:         12:48                    I mean a lot of the same, it’s just they’re typically improving what’s already out there. So whether it’s an availability groups or being able to replicate from one availability group to another, that gives you some migration options that weren’t early when available early in earlier versions of SQL. And so really just a lot of improvements on existing technology. Nothing earth shattering new. And obviously Linux is running a SQL server on Linux is relatively new and that that opens up other unique possibilities and challenges as well along with availability.

    Mary Jo Foley:                 13:35                    Got It. So earlier in the chat you mentioned the SQL server always on availability groups. But then there are also, as you said, third party fail over clustering solutions that can be purpose built by customers who need to have mission critical SQL server databases running in the cloud. When you’re, when you’re talking to customers and your clients about this, how do you position these two things? Like if you, how do you say to them you should look at a option A or option B? Like what, what, what are the things you stack against each other when they’re trying to evaluate which of those two things is best for them?

    Dave Bermingham:         14:17                    Yeah. You know, as you mentioned earlier, I worked for SteelEye Technology, so we have solutions in that space, other third party options. So the customers, if I’m talking to them then they obviously have questions or needs in and are weighing their options. So always on availability groups is obviously great options. It’s available. SQL server courted by Microsoft. A lot of the customers that I’m talking with are concerned about the price of implementing SQL server enterprise edition. So that’s typically probably the number one consideration if they’re only buying SQL server enterprise for the availability options and nothing else, then they are taking a hard look at third party solution cause they can do similar, you know, have fill up across brands and you still stay with the SQL server standard that they issue.

    Dave Bermingham:         15:18                    So that’s majority of the customers I’m speaking with. So that’s easy. It’s black and white. Here’s the cost and you know, here’s the solution. I’ll try it out. That works for you. Great. Let’s move forward. There are some other things that especially when we’re talking about cloud migrations, they’ve probably run failover cluster instances on premise for many, many years. Now they’re moving to the cloud and all of a sudden you can’t build your filler cluster. So they’re looking at availability groups or legacy applications that aren’t certified yet for availability groups. It’s depending upon the version of SQL they’re going to, there have been other patients with distributed transactions and they have to be aware of applications using distributed transactions into this version of SQL support. What would I need to do? So there many times we’re much more comfortable because they’re already making a major change from on premise into the cloud. If they don’t also have to change their availability solution it gives them a just a little bit more comfort in knowing that they don’t have to go through a whole bunch of regression testing to make sure that everything works the way it’s supposed to work.

    Mary Jo Foley:                 16:34                    Got It. So any, any other resources you’d recommend to people who are either just getting started thinking about this or who have taken some steps but still would like more information?

    Dave Bermingham:         16:49                    Well, I’ll plug my blog here – http://www.clusteringformeremortals.com I pretty much exclusively about failover clustering, high availability dash recovery. SQL server is one of my primary topics and cloud. So it’s kind of right in the wheelhouse. And then other than that, you know, Microsoft blogs and documentation of your Azure, AWS obviously the documentation, high visibility, there’s a lot of really good information. Anything that cloud is changing constantly. Every time I log in there’s a new option in a new feature or something in preview. So it’s really even myself has a hard time keeping the documentation up to date because there is always something new. So really the blogs and the following, you know, key people on Twitter or whatever your favorite social networking is. If you really want to stay in the loop and up to date, you know, prime the key people and make sure you keep track of all the latest technology.

    Mary Jo Foley:                 17:59                    That’s good advice for sure. I like, I love your blog title too. Clustering for mere mortals. I don’t know how you came up with that, but that’s great.

    Dave Bermingham:         18:08                    We give credit to a Elvin Christianson from Microsoft for that. When I think was Windows 2008 came out, I was a cultural MVP back then and they made so many improvements to Microsoft clustering. This is clustering for mere mortals and so I got his permission to grab it and use it for my blog.

    Mary Jo Foley:                 18:28                    Awesome.One last point I wanted to bring up because I think it’s something I know when I cover cloud outages, that matters to a lot of people is the whole idea of lessons learned right. After a huge outage happens or something disastrous happens, sometimes Microsoft publishes postmortems, sometimes they don’t. But what do you tell people about how to evaluate the lessons they can learn when things don’t go as planned? I mean, what, what are things that they should be thinking about taking away from something when it goes wrong?

    Dave Bermingham:         19:03                    Well, you said, my thought is Microsoft does a pretty good job of publishing postmortems and it comes to mind about this time last year there was that major outage and stuff central that is really some storm believe it or not.

    Mary Jo Foley:                 19:17                    Oh yeah, yeah, I remember that.

    Dave Bermingham:                           19:22                    Some people were offline for two, three days and the Microsoft had a really great in depth post-mortem they published, I actually heard about speak about it at Ignite, which was just weeks after that outage, you don’t have a good idea of what happened there. And you know, Microsoft has actually taken the steps. They mentioned steps that they were gonna take at Ignite last year. Some of those steps were in preview and including, you know, the synchronous replication, I mentioned earlier that they have, but before that last outage only Microsoft could flip the switch to say, you know why your DR Site is now active. Then you could do it, up at that point, you could not manually flip that switch and had to be by Microsoft. So they introduced into preview , right now the ability for the user to decide, you know what, I’ve been down long enough, I’m gonna make my VR site active. I know that there’s gonna be some data loss associated with asynchronous application, but I need to be online. I can’t wait for Microsoft to get things back up and running.

    Mary Jo Foley:                 20:36                    Oh, that’s cool. So that’s, is that already available now or something coming?

    Dave Bermingham:         20:40                    It’s in preview. You have to sign up for a preview, but if you sign up and you can try it out and make Microsoft good on our promises.

    Mary Jo Foley:                 20:53                    Yeah, that’s great. All right, well we are out of time and I just wanted to say thanks Dave for the really good, thorough job. I really appreciate you doing this, especially on your first day back to school for your kid.

    Dave Bermingham:         21:07                    Yeah, yeah, a quiet house. So it was easier. I appreciate you having me, Mary Jo.

    Mary Jo Foley:                 21:13                    Yeah, you’re welcome. For everybody else who’s listening right now, we’re going to make this chat available very soon, both in audio form and as a full transcript and in the interim and a couple of weeks we’ll be back with our next guest, so be sure to be watching for that. I’ll be posting the information on petri.com and then that will be your signal listeners to send in some questions. All you have to do is go to the MJFChat area in the forums and submit your questions right there. So thank you very much again, Dave, and thanks to all of you listeners.

    Avatar
    Ivan
    Participant
    #623655

    Where is the audio link? Only see transcript here…

    Mary Jo Foley
    Mary Jo Foley
    Moderator
    #623667

    Hi, Ivan. The audio link is here (right under the photo): https://www.petri.com/mjfchat-how-to-build-a-disaster-recovery-survival-kit-for-sql-server

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.