• SLAs

    People often hear about service-level agreements (SLAs) in the cloud environment. Basically, these are promises that companies make about how reliable their service is. A 99.9 percent SLA means you should expect the service to be working correctly 99.9 percent of the time. That's a fairly typical value for an SLA, and it sounds like a very high number, but you might not realize how much down time .1 percent actually amounts to. Here’s a table that shows how much downtime various SLA percentages amount to over a year, a month, and a week.

    So a 99.9 percent SLA means your service could be down 8.76 hours a year or 43.2 minutes a month. That’s more downtime than most people realize. As a developer, you want to be aware that a certain amount of downtime is possible and handle it in a graceful way. At some point someone is going to be using your app, and a service is going to be down, and you want to minimize the negative impact of that on the customer.

    One thing you should know about an SLA is what time frame it refers to: is the clock reset every week, every month, or every year? In Azure, the clock is reset every month, which is better for you than a yearly SLA, since a yearly SLA could hide bad months by offsetting them with a series of good months.

    Of course, Microsoft aspires to do better than the SLA; usually, your app will be down much less time than what’s shown in the previous table.. The promise is that if Azure’s services are ever down for longer than the maximum downtime, you can ask for money back. The amount of money you get back probably won’t fully compensate you for the business impact of the excess downtime, but that aspect of the SLA acts as an enforcement policy and lets you know that Microsoft does take its SLA levels very seriously.


    Composite SLAs
    An important thing to think about when you’re looking at SLAs is the impact of using multiple services in an app, with each service having a separate SLA. For example, the Fix It app uses the Website, Storage, and SQL Database services. Here are their SLA numbers as of June 2014 (note that a 99.99% SLA is available for Storage at extra cost):

    What is the maximum downtime you would expect for the app on the basis of these service SLAs? You might think that your downtime would be equal to the worst SLA percentage, or 99.9 percent in this case. That would be true if all three services always failed at the same time, but that isn’t necessarily what actually happens. Each service may fail independently at different times, so you have to calculate the composite SLA by multiplying the individual SLA numbers.

    This calculation means that your app could be down not just 43.2 minutes a month but three times that amount—108 minutes a month—and still be within the Azure SLA limits.

    This issue is not unique to Azure. Microsoft actually offers the best cloud SLAs of any cloud service available, and you’ll have similar issues to deal with if you use any vendor’s cloud services. What this highlights is the importance of thinking about how you can design your app to handle the inevitable service failures gracefully, because they might happen often enough to impact your customers or users.


    Cloud SLAs compared with enterprise downtime experience
    People sometimes say, “In my enterprise app I never have these problems.” If you ask how much downtime they actually have per month, they usually say, “Well, it happens occasionally.” And if you ask how often, they admit that, “Sometimes we do need to back up or install a new server or update software.” Of course, that counts as downtime. Most enterprise apps, unless they are especially mission-critical, are actually down for more than the amount of time allowed by Microsoft’s service SLAs. But when it’s your server and your infrastructure and you’re responsible for it and in control of it, you tend to feel less angst about down times. In a cloud environment, you’re dependent on someone else, and you don’t know what’s going on, so you might tend to be more worried about it.

    When an enterprise achieves a greater uptime percentage than comes with a cloud SLA, it does so by spending a lot more money on hardware. A cloud service could do that but would have to charge much more for its services. Instead, you take advantage of a cost-effective service and design your software so that the inevitable failures cause minimum disruption to your customers. Your job as a cloud app designer is not so much to avoid failure as to avoid catastrophe, and you do that by focusing on software, not on hardware. Whereas enterprise apps strive to maximize mean time between failures, cloud apps strive to minimize mean time to recover.


    Not all cloud services have SLAs
    Be aware also that not every cloud service even has an SLA. If your app is dependent on a service with no uptime guarantee, your app could be down far longer than you might imagine. For example, if you enable login to your site using a social provider such as Facebook or Twitter, check with the service provider to find out whether there is an SLA, and you might find there isn’t one. But if the authentication service goes down or is unable to support the volume of requests you throw at it, your customers are locked out of your app. You could be down for days or longer. The creators of one new app expected hundreds of millions of downloads and took a dependency on Facebook authentication—but they didn’t talk to Facebook before going live and discovered too late that there was no SLA for that service.


    Not all downtime counts toward SLAs
    Some cloud services may deliberately deny service if your app over uses them. This is called throttling. If a service has an SLA, it should state the conditions under which your app might be throttled, and your app design should avoid those conditions and react appropriately to the throttling if it happens. For example, if requests to a service start to fail when you exceed a certain number of requests per second, you want to be sure that automatic retries don't happen so fast that they cause the throttling to continue.

    Source of Information : Building Cloud Apps With Microsoft Azure


0 comments:

Leave a Reply