Crash Dump Analysis Part 3: Basic BSOD Troubleshooting

Overview

This is the 3rd part in the series, and in this post we’ll actually work through the analysis of a memory dump. There are many other names for the memory dump.  Besides memory dump, they’re called crash dumps, system dumps, blue screen dumps, blue screen errors, system crash files, plus every other combination of those words.  Microsoft calls them memory dumps in their documentation, but I refer to them all synonymously.  No matter whether you call them system dumps or crash dumps, this post tells you how to read them.

How To Find Out What Causes A Blue Screen and Memory Dump

Once you’ve gotten a copy of a memory dump file, the first thing you’ll need to do is

image

  1. – Open Windbg
  2. – Press CTRL+D, or choose “Open Crash Dump…” from the File menu
  3. – Select the crash dump

If you haven’t already gotten your crash dump, then you’ll need to do that.  There are options for connecting to a remote session, but that cannot be used to connect to a memory dump file located on another machine. You can email it to yourself, save it to the administrator’s file server, or carry it over on a jump drive.  For that matter, if you can talk a customer through finding it over the phone (as long as the machine is bootable and operable), just have your customer email it to you.  As long as it’s not a complete memory dump it will likely be small enough to send.

Save the WinDbg Workspace After Making Changes, Then Tell it to Shut Up

image

If you’ve already setup your symbols, then saving the workspace is not a big deal.  Save it or don’t save it.  It’s very likely that what you’ve installed this for is to perform “Postmortem” troubleshooting – that is, figuring out what happened after the system has crashed.  If that’s the case, then all you really need configured is the symbol path.   However, the symbol file path IS part of the workspace, so if you haven’t setup your symbol file path already, then you should go back and check out part 2, where I describe the process.  Then, save your workspace.  Once the symbol file path is setup, you can put a check in the box to “Don’t ask again…”

WinDbg Does the Basic Analysis for You, and You Don’t Have to do Anything Else.

Windbg first looks at the crash dump file and determines what type of system it came from.  Then it checks the symbol path for the correct symbols.  If they don’t exist, it downloads them from the Windows Download Symbol Server.  This makes the process go so much better, because without the right symbols, you can get a much better idea of what is going on. The correct configuration line for your symbol file path is

“SRV*C:\symbols*http://msdl.microsoft.com/download/symbols”.

This looks in C:\symbols on my debugging machine, and if symbol files are needed, they are downloaded into there from http://msdl.microsoft.com/download/symbols.  (This is not a reachable through web browsers, only from WinDbg.)

To demonstrate the quality of information that symbols provide, I’ve changed my symbols path to just “C:\symbols”, and deleted my symbols directory. Here’s part of the initial output of the command window when I connect to a saved BSOD crash dump (a minidump) without my symbols correctly setup.

​*******************************************************************************
*                        Bugcheck Analysis                *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 3B, {c0000005, fffff9600012d907, fffff88003d5cc30, 0}

Unable to load image win32k.sys, Win32 error 0n2
*** WARNING: Unable to verify timestamp for win32k.sys
*** ERROR: Module load completed but symbols could not be loaded for win32k.sys
***** Kernel symbols are WRONG. Please fix symbols to do analysis.

Compare this information with a correctly configured symbol path:

​*******************************************************************************
*                        Bugcheck Analysis                *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 3B, {c0000005, fffff9600012d907, fffff88003d5cc30, 0}

Probably caused by : win32k.sys ( win32k!InternalGetRealClientRect+f ) .  Notice the “InternalGetRealClientRect”?  We are shown that method name because the symbols are working correctly.  Without the symbol files being downloaded we get far less information about what is going on.  With the correct symbols the debugger can usually point the finger at a “probable” culprit.  The debugger never says “is caused by”, instead it says “probably caused by”.

How To Read the Information From a Crash Dump

I don’t know exactly what that method does, but since I have the symbols setup right it at least shows me enough to make an educated guess. I wonder if the Rect in the function name is a rectangle for drawing. Maybe it’s a graphics driver issue, and not a Win32K.sys issue after all. The first thing I do is check out the “probably caused by” line.  If it’s very clear that it’s a driver file, it may be trying to access some memory directly instead of asking the system to do that for it.  Often, whatever driver is pointed at in the probably caused by is the culprit (but not always).  You can usually find an update that resolves the problem.  This can really come in handy if you were thinking that it was the wireless networking card you just installed, but WinDbg points the finger at the video driver instead.  Updating the drivers is a good idea, especially when the system is blue screening and WinDbg points to the driver as the likely cause.  But don’t stop there.

For More Information You Can Check the Call Stack Window

The call stack window shows the last few calls that the thread made before the crash.  It’s available even with a Minidump.  In fact, it’s one of the few advanced windows that are available in a Minidump.  Of course, the symbols play a huge role in deciphering what is going on in the call stack, too.  Here are two instances of WinDbg looking at the same crash dump.  The instance on the left side has no symbols installed.   The instance on the right side has the symbols downloaded from the Microsoft Download Server.

image

The stack is read from the bottom to the top.  Notice the top line of the instance with the symbols setup correctly?  nt!KeBugCheckEx means “Start the Blue Screen of Death process.”  It’s the last thing you see before the blue screen, and the system starts making the memory dump that you’re now viewing. Reading back through the line, we can see where the thread stopped, and then the blue screen was set in motion:

win32k!InternalGetRealClientRect+0xf. 

The next thing that happened after that was a page fault, and then it was “all over but the crying crash dump analysis”. Even before that call took place, the few calls before that are telling:

“NTUserCreateWindowEx”

suggests to me that perhaps a user had opened a program and the system was ready to draw the window.  “xxxCreateWindowEx” is the next step in the process, then it called “SetTiledRect”.  Have I mentioned how awesome the symbol server is?  It sure beats downloading them all yourself, and it really beats trying to decode what was going on when all you see is “win32k+0x8d907”. So this can provide a further clue into what was really going on that may have caused the blue screen.  But there’s still one last place that I’ll mention in Basic BSOD Troubleshooting:  Use the Blue Screen Reference in the help section!

Follow these steps to find out what a Blue Screen of Death error code means

  1. Open WinDbg
  2. Help –> Contents (or press F1)
  3. Expand the “Debugging Techniques” tree.
  4. Expand the “Bug Checks (Blue Screens)” tree.
  5. Expand the “Bug Check Code Reference” tree.
  6. Select the stop code referenced in the memory dump file (in my example it was 3b, so I find 0x3B)

Could this have been done first? Yes, it could have.  I often fire up WinDbg just to find out what an error code means, on behalf of a user or fellow technician.  The blue screen codes that are provided by the system dump- not only the stop code itself but the numbers given after it (especially the first number) is a key indicator of what really went wrong. Want to know what the Blue Screen Reference said about my error code (3b)?  Find out for yourself after you Install the Windows Debugger.

Conclusion

The windows debugger (WinDbg) is a very powerful tool, but it doesn’t have to be used by an expert to be useful.  In the hands of even a novice technician or administrator, this tool should be used to help find out what’s going wrong when a system crash happens.  Using some very simple techniques, anybody can learn more about what happened to cause the system crash, and some probable ways to resolve it.