• Hadoop and MapReduce

    The high volumes of data that you can store in NoSQL databases may be difficult to analyze efficiently in a timely manner. To perform this type of analysis, you can use a framework such as Hadoop, which implements MapReduce functionality. Essentially, what a MapReduce process does is the following:

    • Limits the size of the data that needs to be processed by selecting out of the data store only the data you actually need to analyze. For example, if you want to know the makeup of your user base by birth year, the process selects only birth years out of your user profile data store.

    • Breaks down the data into parts and sends them to different computers for processing. Computer A calculates the number of people with dates between 1950 and 1959, computer B works on dates between 1960 and 1969, and so on. This group of computers is called a Hadoop cluster.

    • Puts the results of each part back together after the processing on the parts is complete. You now have a relatively short list of how many people have each birth year, and the task of calculating percentages in this overall list is manageable.

    On Azure, HDInsight enables you to process, analyze, and gain new insights from big data by using the power of Hadoop. For example, you could use HDInsight to analyze web server logs in the following manner:

    • Enable web server logging to your storage account. This sets up Azure to write logs to the Blob service for every HTTP request to your application. The Blob service is basically cloud file storage and integrates nicely with HDInsight.

    • As the app gets traffic, web server IIS logs are written to Blob storage.

    • In the Azure management portal, click New, Data Services, HDInsight, Quick Create, and then specify an HDInsight cluster name, cluster size (number of HDInsight cluster data nodes), and a user name and password for the HDInsight cluster.

    You can now set up MapReduce jobs to analyze your logs and get answers to questions such as:

    • What times of day does my app get the most or least traffic?

    • What countries is my traffic coming from?

    • What is the average neighborhood income of the areas my traffic comes from? (There's a public dataset that provides neighborhood income by IP address, and you can match that data against the IP addresses in the web server logs.)

    • How does neighborhood income correlate to specific pages or products in the site?

    You could then use the answers to questions such as these to target ads based on the likelihood that a customer would be interested in or would be likely to buy a particular product.

    Most functions that you can perform in the management portal can be automated, and that includes setting up and executing HDInsight analysis jobs. A typical HDInsight script might contain the following steps:

    • Provision an HDInsight cluster and link it to your storage account for Blob storage input.

    • Upload the MapReduce job executables (.jar or .exe files) to the HDInsight cluster.

    • Submit a MapReduce job that stores the output data to Blob storage.

    • Wait for the job to complete.

    • Delete the HDInsight cluster.

    • Access the output from Blob storage.

    By running a script that performs these steps, you minimize the amount of time that the HDInsight cluster is provisioned, which minimizes your costs.

    Source of Information : Building Cloud Apps With Microsoft Azure


1 comments:

  1. lanehaack says:

    How to get to Golden Nugget Hotel & Casino in Phoenix - DRMCD
    You'll 성남 출장마사지 find plenty of live entertainment, and lots of food 사천 출장샵 options. For those 제주도 출장샵 looking for more 여수 출장안마 entertainment options, 서귀포 출장샵 the Golden Nugget Casino Resort has a

Leave a Reply