Code Library: Log for insight

Log for insight

A telemetry package is a good first step, but you still have to instrument your own code. The telemetry service tells you when there’s a problem and tells you what customers are experiencing, but it may not give you a lot of insight into what’s going on in your code.

You don’t want to have to remote into a production server to see what your app is doing. That might be practical when you have one server, but what about when you’ve scaled to hundreds of servers and you don’t know which ones you need to remote into? Your logging should provide enough information that you never have to remote into production servers to analyze and debug problems. You should be logging enough information so that you can isolate issues solely through the logs.

Log in production
A lot of people turn on tracing in production only when there’s a problem and they want to debug. This approach can introduce a substantial delay between the time you become aware of a problem and the time you obtain useful troubleshooting information about it. And the information you get might not be helpful for intermittent errors.

What we recommend for the cloud environment, where storage is cheap, is that you always leave logging on in production. That way, when errors happen, you already have them logged and have historical data that can help you analyze issues that develop over time or happen regularly at different times. You could automate a purge process to delete old logs, but you might find that it's more expensive to set up such a process than it is to keep the logs.

The added expense of logging is trivial compared with the amount of troubleshooting time and money you can save by having all the information you need already available when something goes wrong. Then, when someone tells you they had a random error sometime around 8:00 last night, but they don’t remember the error, you can readily find out what the problem was.

For less than $4 a month, you can keep 50 gigabytes of logs on hand, and the performance impact of logging is trivial so long as you keep one thing in mind— be sure your logging library is asynchronous

Differentiate logs that inform from logs that require action
Logs are meant to INFORM (I want you to know something) or ACT (I want you to do something). Be careful to write ACT logs only for issues that genuinely require a person or an automated process to take action. Too many ACT logs will create noise, requiring too much work to sift through all the log records to find genuine issues. And if your ACT logs trigger some action, such as sending email to support staff, avoid having a single issue trigger thousands of such actions.

In .NET System.Diagnostics tracing, logs can be assigned to the Error, Warning, Info, or Debug/Verbose level. You can differentiate ACT from INFORM logs by reserving the Error level for ACT logs and using the lower levels for INFORM logs.

Configure logging levels at run time
While it’s worthwhile to always have logging on in production, another best practice is to implement a logging framework that enables you to adjust at run time the level of detail that you’re logging, without redeploying or restarting your application. For example, when you use the tracing facility in System.Diagnostics, you can create Error, Warning, Info, and Debug/Verbose logs. We recommend that you always log Error, Warning, and Info logs in production and be able to dynamically add Debug/Verbose logging for troubleshooting on a case-by-case basis.

The Azure Websites service has built-in support for writing System.Diagnostics logs to the file system, Table storage, or Blob storage. You can select different logging levels for each storage destination, and you can change the logging level on the fly without restarting your application. Logging support in Blob storage makes it easier to run HDInsight analysis jobs on your application logs because HDInsight knows how to work with Blob storage directly.

Log exceptions
Don’t just put exception.ToString() in your logging code. That leaves out inner exceptions and contextual information. In the case of SQL errors, it leaves out the SQL error number. For all exceptions, include context information, the exception itself, and inner exceptions to be sure that you provide everything that’s needed for troubleshooting. For example, context information might include the server name, a transaction identifier, and a user name (but not the password or any secrets!).

Not every developer will do the right thing with exception logging if you rely on them to do so individually. To ensure that logging is done the right way every time, build exception handling into your logger interface: pass the exception object itself to the logger class and log the exception data properly in the logger class.

Log calls to services
We highly recommend that you write a log every time your app calls out to a service, whether to a database, a REST API, or any external service. Include in your logs not only an indication of success or failure but how long each request took. In the cloud environment you’ll often see problems related to slowdowns rather than complete outages. Something that normally takes 10 milliseconds might suddenly start taking a second. When someone tells you your app is slow, you want to be able to look at New Relic or whichever telemetry service you have and validate the user’s experience, and then you want to be able to look at your own logs to dive into the details of why your app is slow.

Use an ILogger interface
What Microsoft recommends doing when you create a production application is to create a simple ILogger interface and stick some methods in it. This makes changing the logging implementation later much easier, and you don’t have to go through all your code to do it. We could use the System.Diagnostics.Trace class throughout the Fix It app, but instead we’re using it under the covers in a logging class that implements ILogger, and we make ILogger method calls throughout the app.

With an approach such as this, if you ever want to make your logging richer, you can replace System.Diagnostics.Trace with whatever logging mechanism you want. For example, as your app grows, you might decide that you want to use a more comprehensive logging package, such as NLog or Enterprise Library Logging Application Block. (Log4Net is another popular logging framework, but it doesn't perform asynchronous logging.)

One reason for using a framework such as NLog is to divide logging output into separate high-volume and high-value data stores. Doing that helps you efficiently store large volumes of INFORM data that you don’t need to execute fast queries against, while maintaining quick access to ACT data.

Semantic logging
For a relatively new way to do logging that can produce more useful diagnostic information, see Enterprise Library Semantic Logging Application Block (SLAB). SLAB uses Event Tracing for Windows (ETW) and EventSource support in .NET 4.5 to enable you to create more structured and queryable logs. You define a different method for each type of event that you log, which enables you to customize the information you write. For example, to log a SQL Database error you might call a LogSQLDatabaseError method. For that kind of exception, you know that a key piece of information is the error number, so you could include an error number parameter in the method’s signature and record the error number as a separate field in the log record you write. Because the number is in a separate field, you can more easily and reliably get reports based on SQL error numbers than you could if you were just concatenating the error number into a message string.

Source of Information : Building Cloud Apps With Microsoft Azure

Code Library

Categories

0 comments:

Leave a Reply