Monitoring Exchange and Finding Common Problems
In our daily jobs as Systems Engineers and Administrators we come across systems that are in need of our help… and may even be asking for it. Beyond checking your Event Viewer’s, this article looks at some common issues you might find on the server you are running Exchange on. As a Network and Systems Consultant, I see many times that Exchange itself is not the problem, but the fact that Exchange is installed on a sub-par system. Either the server hardware isn’t enterprise class, or the minimum hardware requirements weren’t addressed. In this article, we will take a look at checking the fundamentals of your Exchange System and look at a real world production server suffering from a common problem.
Note: This article is published with permission from www.msexchange.org
Make Sure Exchange is Strong
Would you drive a truck on a sheet of glass over a bridge? No. Then why would you run an enterprise class server operating system hosting a mission critical application such as e-mail and messaging on an antiquated desktop? Don’t think it happens? It happens more than you think. In the past 5 years alone I have worked with many teams of experts weeding out these exact systems and replacing them with what should have been there before… a system that was thought out and built strong. Now, you don’t have to cluster everything you run, but it would help if your enterprise level servers were running with RAID as an example. RAID can help you in a pinch, when you lose a disk (and you will based on the MTBF), you can quickly recover with minimal downtime and no loss to your data.
There are ways around this of course. When someone wants to save money, there is always an way around doing the right thing… like… how about putting the SMTP server on a desktop to save money as an example. The truth is, if you simply follow the posted guidelines that are found on Microsoft’s Website, you will find that whether you go with an enterprise chassis or a overpowered desktop… do yourself the favor of at minimum, not shortchanging yourself (of Exchange) of the posted minimal requirements needed to have Exchange function.
Exchange Server Minimum Requirements
In this article we cover a real world situation where I found an Exchange Server in desperate need of a hardware upgrade. The original problem was thought to be the network, but once some analysis time was spent on the project, it was deemed to be the Exchange Server itself. The clients complained of timing out… getting the dreaded bubble popup … the network was thought to be the culprit. A quick look at the Performance Console on the Exchange Server told me otherwise.
The Performance Console is a Microsoft Management Console (MMC) snap-in that enables monitoring numerous parts of the internal workings of your server. You would be amazed at what you can turn up if you a) know what you are looking for and b) know how to read the console. In this article we take a look at the Performance Console in hopes to find a problem. To open the console, you can go to your Administrative Tools folder in the Startup Menu (or in the Control Panel).
Once you open the Performance Console, you will see that there are a few items that are flagged to be monitored right off the bat… these counters alone will tell you a lot. In this example, I have recreated the same problem on the production server on a test system called ”shimonski.
The System Monitor Component of the Performance Console is what you can use to find problems. Before we look at the actual problem we should first do a quick refresher on what you are looking at. For one, to learn a whole lot about Performance in general, I suggest reading an article from one of our system sites here at MSExchange.org. Mitch Tulloch wrote an exceptional article on Monitoring Key Performance Counters. I suggest reading this article:
Monitoring Key Performance Counters
As you may find over time, monitoring key performance counters will amaze you.
The Performance Console will help you to monitor key Exchange counters such as MSExchangeIS and a plethora of others. Installing Exchange installs the counters as well. In this article however, our focus is on the most common: Processor, Memory, and PhysicalDisk counters.
In this example, we shall call the company 123 Ltd. 123 Ltd have a problem with the Exchange Server. It was initially thought to be network related problem that was ruled out with sniffers / protocol analysis software. A quick analysis of bandwidth reports from the WAN showed that there was indeed no network problem beyond your normal occasional outage of service. Once the network was ruled out, the systems were analyzed. The following figure shows the real system that was analyzed. As you can see, it looks much like the test system I configured for this example.
In the figure you can see that the Avg. Disk Queue Length counter (as well as the Pages/sec) are spiking and at the same time, showing a CPU spike. Believe it or not, but this is what is seen by opening the console and not even adding a counter! This means that either the IT team did not know what the counters meant or it means that the counters were never examined because they didn’t know to. Further questioning of the IT team showed that this system was taken over by the current IT group that were replaced from an acquisition. Nobody had checked it out, ever. Nobody knew to check it.
I added a few more counters. Just to show you other counters that can be added to gather information. The next figure shows the adding of a counter. To add a counter, click the “+” plus-sign on the top of the Performance Console within System Monitor. You can then add more counters.
Exchange counters that can be added to monitor new objects within Exchange. This server was a server that had many problems with the Information Store and seemed to be throwing users off all the time.
The next figure shows the ‘Explain’ button in use. Take a moment to utilize this button to see what a counter might be used for.
The following table shows a list of counters that you can select and what they might tell you. This exact information can be found online and on Microsoft TechNet. If you are in the business of taking care of servers (or in my business… of proving out the network), then learning this tool will definitely save you some time and headaches. The ‘Explain’ button will also help you with this exact information.
|Available Mbytes – Displays the amount of physical memory, in bytes, available to processes running on the computer. Bytes Total/sec –The total rate of bytes transferred by the Web service. This counter is the sum of Bytes Sent/sec and Bytes Received/sec. Client Latency – The latency of MAPI/remote procedure call (RPC) actions measured at the LoadSim/Microsoft Office Outlook client. This counter measures the time it takes for the server to fulfill the client request. It can be used to estimate the time a user would have to wait between initiating individual Outlook actions. Database’Database Cache Size –The average amount of system memory used by the database cache manager to hold commonly used information from the database files to prevent file operations. If the database cache size seems too small for optimal performance and there is very little available memory on the system (see Memory/Available Mbytes), adding more memory to the system may increase performance. If there is a lot of available memory on the system and the database cache size is not growing beyond a certain point, the database cache size may be restricted to an artificially low limit. Increasing this limit may increase performance. DB Disk Transfers/sec – The average sum of all random read/write input/output (I/O) operations to the Microsoft Exchange Database disk volumes (both .edb and .stm files). Disk Bytes/sec – The average number of disk bytes written or read per second across all disk volumes. IMAP4 Connections – The number of current Internet Message Access Protocol version 4rev1 (IMAP4) client connections. IMAP4 UID/sec – The number of unique identifier (UID) commands per second. ISAPI Extension Requests/sec – The number of requests per second for Outlook Web Access transactions. Log Writes/sec – The average sum of all sequential write I/O operations to the Exchange log file disk volumes (.log files). MSExchangeIS Mailbox’Local Delivery Rate – The average rate at which messages are delivered locally to the Exchange store. MSExchangeIS’RPC Operations/sec – The rate at which RPC operations occur. This counter is a good rate counter to measure Exchange workload because all MAPI-based actions use the RPC protocol. MSExchangeIS’RPC Requests – The number of client requests that are currently being processed by the Exchange store. Network Interface’Bytes Total/sec – The average rate at which bytes are sent and received over each network adapter, including framing characters. Network Interface’Bytes Total/sec is the sum of Network Interface’Bytes Received/sec and Network Interface’Bytes Sent/sec. Network Usage – Measures network traffic on the server going to and from the server’s network adapter. POP3 DELE/sec – The number of message delete commands per second. POP3 STAT/sec – The number of STAT commands per second. A STAT command is issued once per each user’s connection. Private Bytes – Displays the current number of bytes this process has allocated that cannot be shared with other processes. Processor’% Processor Time – The average percentage of elapsed time that the processor spends to execute a non-idle thread. It is calculated by measuring the duration of time the idle thread is active in the sample interval, and subtracting that time from the interval duration. (Each processor has an idle thread that consumes cycles when no other threads are ready to run.) This counter is the primary indicator of processor activity, and it displays the average percentage of busy time observed during the sample interval. SMTP Local Queue – The number of messages in the local queue waiting delivery to local users. SMTP Messages Del/sec – The number of messages being delivered each second to local users. SMTP Messages Sent/sec – The number of messages being sent each second to a remote server. Store Virtual Bytes – The average size, in bytes, of the virtual address space that the Store.exe process is using. Use of virtual address space does not necessarily imply corresponding use of either disk or main memory pages. Virtual space is finite, and the process can limit its ability to load libraries. System’Context Switches/sec – The combined average rate at which all processors in the computer are switched from one thread to another. Context switches occur when a running thread voluntarily relinquishes the processor, is preempted by a higher priority ready thread, or switches between user-mode and privileged (kernel) mode to use an Executive or subsystem service. This counter is the sum of Thread’Context Switches/sec for all threads running on all processors in the computer, and it is measured in numbers of switches. There are context switch counters on the System and Thread objects. This counter displays the difference between the values observed in the last two samples, divided by the duration of the sample interval. Web ISAPI Extension Requests/sec – The rate at which Internet Server Application Programming Interface (ISAPI) extension requests are received by the Web service. Internet server API requests are used by Outlook Web Access to access the Exchange server. Working Set – The set of memory pages (areas of memory allocated to a process) recently used by the threads in a process. If available memory on the server is above a specified threshold, pages remain in the Working Set of a process even if they are not in use. When available memory falls below a specified threshold, pages are removed from the Working Set. If these pages are needed, they will be returned back to the Working Set before they leave main memory and are made available for other processes to use.|
What was the Problem?
Now that you are familiar with the Performance Console, System Monitor and things that can be monitored with it, we should think about our original problem. We had an Exchange Server experiencing problems that were thought to be network related. Now that we see that it is not we need to come back to the base system and see what the problem could be.
Now that you understand what is being monitored, let’s take a look at figure 6. Figure 6 shows us that we have a server running on sub-par equipment. After careful analysis it was determined that the server was old (5 years old, which is about 100 in computer and dog years) and it had 512 MB of RAM, it was less than 1 GHz on the CPU and their was only 4 GB of free space left on both the system and data drives. The Swap file was located on system drive.
I am cutting to the chase here with what went wrong on purpose… it doesn’t matter what we can look for, the default settings told us exactly what the problems were. Every time you see a spike in Figure 6, that’s when all the users got frozen or disconnected from the Exchange Server… every time. If the spike was continous, then all users would lock up. This was nothing more than a server that was low on hardware. Think about it, the network was checked, the systems were checked… to find that everything led back to this:
Exchange Server Minimum Requirements
I hope that this real world issue showed you how easy a problem can be to fix if you know how to look for it. Many times people just point their finger to the network… it’s the easiest thing to blame because it’s the least understood. Remember, you need hardware in the box, it’s that simple. Think about what you are running on the server. You have your information store (store.exe) which is one of the biggest individual consumers of memory in Exchange Server 2003. Store.exe processes and manages mailboxes and public information storage… you need disk space, you need swap file space, you need memory to handle Windows Server 2003, Active Directory and Exchange Server 2003. You have other processes such as Inetinfo.exe which processes and handles Internet protocols and IIS. You have the MTA (Emsmta.exe) and the System Attendant (Mad.exe). Antivirus software and backup software are also common. You can see it just add up in Task Manager. Make sure you think about your base systems and check out what your servers are telling you, you may find that they just need a little love and attention to get back to running primo again.
Robert J. Shimonski (TruSecure TICSA, Cisco CCDP, CCNP, Cisco Firewall Specialist, Nortel NNCSS, Microsoft MCSE, MCP+I, Novell Master CNE, CNE 4/5/6, CIP, CIBS, IWA CWP, Prosoft MCIW, SANS GSEC, GCIH, CompTIA HTI+, Security+, Server+, Network+, Inet+, A+, e-Biz+, Symantec SPS and NAI Sniffer SCP) is a Network Manager for a leading manufacturing company as well as a part time trainer and author. Robert’s specialties include network infrastructure design with the Cisco and Nortel product line, network security design and management with, PIX firewalls, Check Point systems, all types of routers and switches, Systems Engineering with all major operating system platforms and troubleshooting with Sniffer-based technologies. Robert is Author of many security related articles and published books to include the new: “Sniffer Network Optimization and Troubleshooting Handbook” from Syngress Media Inc (ISBN: 1931836574). Robert is also the author of the best selling: Security+ Study Guide and DVD Training System (ISBN: 1931836728) also from Syngress.
Note: This article is published with permission from www.msexchange.org