Code Library: Diagnostics in the cloud

Diagnostics in the cloud

At some point you might need to debug your code, or you’ll want to judge how healthy your application is while it’s running in the cloud. We don’t know about you, but the more experienced we get with writing code, the more we know that our code is less than perfect. We’ve drastically reduced the amount of debugging we need to do by using test-driven development (TDD), but we still need to fire up the debugger once in a while.

Debugging locally with the SDK is easy, but once you move to the cloud you can’t debug at all; instead, you need to log the behavior of the system. For logging, you can use either the infrastructure that Azure provides, or you can use your own logging framework. Logging, like in traditional environments, is going to be your primary mechanism for collecting information about what’s happening with your application.

Using Azure Diagnostics to find what’s wrong
Logs are handy. They help you find where the problem is, and can act as the flight data recorder for your system. They come in handy when your system has completely burned down, fallen over, and sunk into the swamp. They also come in handy when the worst hasn’t happened, and you just want to know a little bit more about the behavior of the system as it’s running. You can use logs to analyze how your system is performing, and to understand better how it’s behaving. This information can be critical when you’re trying to determine when to scale the system, or how to improve the efficiency of your code.

The drawback with logging is that hindsight is 20/20. It’s obvious, after the crash, that you should’ve enabled logging or that you should’ve logged a particular segment of code. As you write your application, it’s important to consider instrumentation as an aspect of your design.

Logging is much more than just remote debugging, 1980s-style. It’s about gathering a broad set of data at runtime that you can use for a variety of purposes; debugging is one of those purposes.

Challenges with troubleshooting in the cloud
When you’re trying to diagnose a traditional on-premises system, you have easy access to the machine and the log sources on it. You can usually connect to the machine with a remote desktop and get your hands on it. You can parse through log files, both those created by Windows and those created by your application. You can monitor the health of the system by using Performance Monitor, and tap into any source of information on the server. During troubleshooting, it’s common to leverage several tools on the server itself to slice and dice the mountain of data to figure out what’s gone wrong.

You simply can’t do this in the cloud. You can’t log in to the server directly, and you have no way of running remote analysis tools. But the bigger challenge in the cloud is the dynamic nature of your infrastructure. On-premises, you have access to a static pool of servers. You know which server was doing what at all times. In the cloud, you don’t have this ability. Workloads can be moved around; servers can be created and destroyed at will. And you aren’t trying to diagnose the application on one server, but across a multitude of servers, collating and connecting information from all the different sources. The number of servers used in cloud applications can swamp most diagnostic analysis tools. The shear amount of data available can cause bottlenecks in your system.

For example, a typical web user, as they browse your website and decide to check out, can be bounced from instance to instance because of the load balancer. How do you truly find out the load on your system or the cause for the slow response while they were checking out of your site? You need access to all the data that’s available on terrestrial servers and you need the data collated for you.

You also need close control over the diagnostic data producers. You need an easy way to dial the level of information from debug to critical. While you’re testing your systems, you need all the data, and you need to know that the additional load it places on the system is acceptable. During production, you want to know only about the most critical issues, and you want to minimize the impact of these issues on system performance.

For all these reasons, the Windows Azure Diagnostics platform sits on top of what is already available in Windows. The diagnostics team at Microsoft has extended and plugged in to the existing platform, making it easy for you to learn, and easy to find the information you need.

Source of Information : Manning Azure in Action 2010

Code Library

Categories

0 comments:

Leave a Reply