Keeping an eye on the behaviour of a Web API allows you to know how a service is behaving for your customers. Is the response time adequate? Are there common errors that are occurring on a regular basis? Or, on an even more basic level, is the service completely broken?
There are many ways to keep an eye on an API in .NET but those methods are highly dependent on the goal you’re trying to achieve. How you’re hosting the Web API (through IaaS, PaaS, etc.) must also factor into the decision. And finally, the monitoring solution will also depend on what is available through your cloud provider.
Up, down or somewhere in between
“Hey, is everything up at the moment?” is a sentence most developers have heard at one point or another in their career. It’s often the moment when they realize they need some kind of monitoring solution for the product. It allows a team to go from being reactionary to a problem to being the ones who inform their customers that there’s a problem and they’re already working on fixing it.
Start thinking about what’s important (and what’s not) to your service before even looking at potential monitoring solutions. Which components are critical enough that they require an immediate intervention? Your initial thought will be to say everything, but when you get down to it there are some degradations that the business will be willing to live with for a certain period of time. For example, authentication is generally an API that needs to always work correctly. A recommendation engine within your product, on the other hand, could be non-critical.
Also try to figure out how you want to determine if a component is broken. Is it a high number of 500 errors over a given period of time? Is it the database becoming unavailable? Or maybe a downstream service that isn’t responding in an adequate amount of time? All of these factors are good to know going into the decision process.
It’s easy to identify and fix issues once you know where the weaknesses lie. Application Performance Monitoring (APM) tools help you do just that. APM is all about knowing how well your service is handling the traffic that is being directed at it. Response time, requests per minute, Apdex score, and frequent crashes and status by machine are all metrics that give a team a better understanding of how the application is doing.
There are a few big players in the APM market that have great integrations with .NET. I’ll talk more about them in the recommendations section below.
The first instinct when a service reaches its maximum capacity is to spin up more instances that can share the load. This solutions works, of course, but at the cost of excess capacity outside of traffic peaks. That excess capacity has a monetary cost associated with it.
It may be time to consider autoscaling if you have lots of servers running an application but they are idling along most of the time. Monitoring for scalability means tightly integrating your product with your cloud platform. Solutions that work on AWS won’t be achievable on Azure (and vice-versa).
Service Level Agreements
Ensuring that you deliver on SLAs is one of the toughest things to do when it comes to cloud monitoring. Lots of work must go into it and even just establishing the SLA number is not a simple task since it has to take into account your cloud provider’s SLA. The topic of SLAs deserves a deep dive and won’t be addressed in this article.
The best tool by far for monitoring an API hosted in Azure is Application Insights. It’s fully integrated into Visual Studio and Azure Portal and the first gigabyte of data per month is free.
There’s one big problem with it though; the number of regions in which it is available is limited. At the time of writing this article, it was only supported in 6 out of Azure’s 28 non-governmental regions. Granted, it is available in their most popular regions, but there are some noticeable gaps, such as most of Europe and Australia.
If you’re hosting your application in a region that doesn’t have App Insights, your best bet is going to be one or both of NewRelic and Datadog. Both of these products accomplish similar tasks but in slightly different ways.
NewRelic is a pure APM and is targetted at developers who want to get under the hood of their product. Its level of detail is just right, making it easy to see what’s going on with your application in near real-time. It’s helped me on more than a few occasions to identify problems with a deployment.
Datadog is more focused on the infrastructure view of an application. It can be fed application data as well, but its design and functionality feels slightly more geared towards the DevOps side of things.
So if you’re trying to understand what makes your application tick, NewRelic is the way to go. On the other hand, if you want to have a better view of the machines and resources being used by the application, Datadog is a better solution. They are very complementary and it’s very easy to make a case to have both.
To enable autoscaling, it’s best to rely on Azure’s built-in autoscaling solutions. Some services, such as Service Fabric, App Service, and Functions all have autoscaling features and need only be configured. Virtual Machines also support scalability via Scale Sets and need a bit more work to set them up properly.
There isn’t an equivalent to App Insights on the AWS side. So if your app is running on an EC2 instance or through Beanstalk, I’d again recommend going down the NewRelic and/or Datadog route for all the same reasons as above.
Amazon launched an APM last year called X-Ray. It isn’t as feature rich as App Insights, NewRelic, or Datadog, but it is free and provides the basic functionality you’d expect, like tracing of requests. It’s still very early days and has lots of catching up to do.
One of the first and easiest steps that anyone can take to know the status of an application or service is to have a health endpoint. At its simplest, it returns a 200 OK status code when everything is fine and a 503 Service Unavailable when its broken. It’s also possible to provide more information in the body of the response with the result of the invidividual checks that were performed. Microsoft have written an extensive article on how to best implement a health endpoint.
Use A Dedicated Monitoring Tool
The tool you choose, of which there are many more than those I spoke of, should be focused on solving APM and monitoring problems. Trying to repurpose another tool that is already in your toolbox is likely to result in a poor experience.
And finally, don’t try and build your own monitoring tool. Companies like NewRelic and Datadog have spent large amounts of time and money solving this particular problem. Anything you build yourself is likely to take much longer, won’t be as feature rich, and will cost much more than paying for an existing product.