Demystifying Azure App Services - Diagnostics and Telemetry
In the last post, we looked at virtual machines for App Services, and we looked at the different approaches you can use to customize the machine, and thus your application environment. In this post, we are going to focus on the diagnostics and telemetry you can gather from an App Service. The information collected by Azure is useful when you need to troubleshoot problems with your application, but you also need telemetry and metrics to understand how your application behaves under normal conditions.
Metrics with a side of AI
If you’ve worked with Azure for any length of time, you’ll already know there are Azure services that can provide telemetry and metrics across a range of resource types. Azure App Insights, for example, is a wonderful source of information for applications running in any environment inside or outside of Azure and on devices from mobile phones to backend servers. Azure Monitor is another great tool for collecting data from various sources and firing alerts when abnormal results appear.
Services like App Insights and Monitor are well covered in other articles and tutorials (see Monitor Your Website’s Availability with Azure Application Insights, for example), so I want to focus on some hidden gems inside the App Services resource itself.
Diagnose and Solve Problems
Inside every App Service is a section named “Diagnose and Solve Problems”. What makes this feature so valuable is that the section gives you more than just raw metrics and measurements. Yes, you can see the number of failed requests over the last 24 hours and yes, you can find the CPU utilization percentage for all instances of your app service plan. There are numbers, but there are also guided tours and heuristics that can lead you to uncover problems and potential misconfigurations in your App Service.
Let’s take a basic health check, as an example. Clicking on the “Health Check” button under the “Availability and Performance” section will bring you to a collection of graphs. The first graph will show you the number of requests and errors your app has experienced over the last 24 hours. The second graph, shown below, shows the app performance by categorizing response time into the 50th, 90th, and 95th percentile. The third report is a CPU utilization report, while the fourth report covers memory usage.
If the diagnostics service detects a problem, you’ll see a red exclamation mark appear, like the one on the memory usage tab above. You can then drill into a full report for the problem area or launch one of the troubleshooting wizards that target common problems. As you can see below, the wizards can help with low level operations, like collecting a memory dump, as well as giving easy access to features like an advanced application restart. A regular App Service restart will shut down all running instances at once and then restart the instances, meaning your users will experience down time. An advanced restart allows you to stagger the restarts across your running instances and keep the App Service responsive.
Follow the Best Practices
The App Service diagnostics go beyond just troubleshooting. There is also a wizard to make sure an App Service is using an optimal configuration. This “best practices” wizard will check for SSL and certificate problems, while also making sure an App Service is set to auto-scale and is using deployment slots. Another section, the availability and performance checks, will make sure the App Service is using a production SKU, and that the application request routing cookie is disabled, and that the temporary file storage is not exceeding a safe threshold, and several other checks that can make an App Service more robust.
From rejected IPs to server-side 500 errors, the App Service diagnostics give you a quick view into existing problems and potential problems in your App Service. What works so well with the diagnostic tools is how Microsoft has put together solutions to the most common problems across all App Service applications. While App Insights will give you the best low-level telemetry, the diagnostic tools are valuable for finding problems you might not have noticed!