Troubleshooting database slowness can be both fairly straightforward and tricky at the same time. Most of the time the problems will present themselves to you right away and the fixes will be fairly easy too, but there are those times when the exact issue will elude you and even when you find it there is no really good solution.
Here, we're going to concentrate on that low hanging fruit that will get you through around 90% of your database speed issues.
First Up: Determine If it Really Is an SQL Server Issue
Let’s say that you’ve definitely determined that the issue is inside SQL Server via Troubleshooting a Slow Database Server: Is SQL Server the Cause?. The first thing to do inside SQL Server is to see which query is causing the problems. To do this, you need to open a query window and type the following line of code:
Select * from sys.sysprocesses order by cpu desc
This of course, depends on what your external investigation revealed. If it revealed a CPU issue, then the above query is what you want to use.
If it revealed a disk issue, then you’ll want to change the order by clause to “order by physical_io desc”. And of course if it revealed memory issues, then you’ll want to change your order by clause to “order by memusage desc”.
Each of these order clauses tells you what the most expensive queries are in terms of what you’re ordering by. So the “order by cpu desc” clause puts the highest CPU queries at the top of the list. These are the queries you want to concentrate on.
Let's just assume in this case that CPU is the problem. There are a number of things inside a query that could be causing it to use high CPU, and since this article is about diagnosing, we're going to leave most of that for another time. There really are just too many factors to get into in a single article. Needless to say though, that once you find the offending query or queries, you can pass that information along to your DBAs or whoever deals with these issues and have them look at it.
Note: Unless you’ve got some experience fixing T-SQL coding issues, we highly suggest you don’t attempt it by yourself.
Assuming that the 3 variations of the query above come back within normal parameters, you’ll need to check some other things in that same table. This is also the query you would start with if your initial investigation didn’t reveal anything out of whack. So this time we’re going to check for blocking by typing this query:
Select * from sys.sysprocesses
We could order it, but this is just as easy. This time you want to concentrate on the blocked column. A little blocking in your database is ok, and is certainly natural as long as these blocks are coming and going, but if you’ve got a single process that’s blocking everything else, then that’s clearly going to be the problem. It’s quite possible that you’ll end up with a blocking chain. This is where a single process blocks another, which causes that to block something else, which causes that to block something else and so on. So what you’re really looking for here is the lead blocker. This is the process in the chain that is blocking something, but isn’t being blocked by anything itself.
In the above example it seems like it’s jumping all over the place, but you can see that the only spid that is blocking another, but isn’t being blocked itself is 105. Therefore spid 105 is the lead blocker and it’s the one that needs to be dealt with. And again, once you find that lead blocker, someone has to do something about it, and finding out why it’s blocking takes more skill and should be handled by a DBA. However, most of the time you can at least find out what that spid is doing by typing the following query:
In the case above you’d type: DBCC InputBuffer(105)
Following the same model as above there’s one more column that can be very interesting at this high level and that’s lastwaittype. This is a very interesting column because it can tell you what the offending query is waiting for to complete. And unfortunately it’s not really that simple, but for our purposes here it’s adequate. Here are some simplified explanations of some of the values that you’ll come across most often and that will help you know what’s going on.
Network_io – This is an easy one. If you see this lastwaittype for an extended time you know you’re waiting on the network, and this is a good indication that there’s too much traffic. The solution here is typically to call your networking team to look at what’s happening.
Cxpacket – This typically means that your process is waiting on other parallel processes to complete. So what’s probably happening is you have a process running on more than one CPU and one thread finished before the rest. To fix this you’d want to look at maybe limiting your parallelism either at the query or at the server-level. It could also be a simple query tuning issue which is why it’s usually best to get a DBA involved.
SOS_SCHEDULER_YIELD – Long wait times here typically mean that you’re CPU bound. You may not have enough CPU in your box, or more likely, you may have some queries that need tuning. If things have been running OK, and this just started happening, then it’s way more likely that you’ve got queries hogging CPU.
IO_Completion – If you’ve got high wait times, then this could be a disk issue. You could either not have enough disks for the operation or you could be running an array in degraded mode. These are both easy enough to check. Check your array and have your DBA check to see if the database files are configured correctly.
So as you can see it can be pretty straight-forward to get to the offending process inside SQL Server. The fix is not always as straight-forward and often times takes a qualified DBA to know how to perform the necessary steps without breaking anything.
This article has been an introduction to finding SQL Server issues so you at least know where to start looking to apply fixes.
Final Note: We can’t stress enough that having performance baselines of all your processes will help you more than anything because you now have something to compare it to. If you have no idea what the process normally looks like when it runs then how do you know when it’s out of whack? For instance, many processes often take up a lot of CPU, and if you don’t know that, then you’re likely to be alarmed when you see them in the above queries. So you may spend a lot of time chasing false issues.
- Should you worry about SOS_SCHEDULER_YIELD? by Joe Sack
- Case Study: Part 2: CXPACKET Wait Stats... by Jimmy May