• Introducing PLINQ

    For business applications, PLINQ will shine anytime you have a LINQ query that involves multiple subqueries. If you’re joining rows from a table on a local database with rows from a table in another remote database, PLINQ can be very useful. In those situations, LINQ must run subqueries on each data source separately and then reconcile the results. PLINQ will distribute those subqueries over multiple processors-if any are available-so that they run simultaneously.

    You won’t use fewer processor cycles to get your result-in fact, you’ll use more-but you’ll get your result earlier.

    Even on a multi-core machine, PLINQ won’t always "parallelize" a query, for two reasons. One is that your application won’t always run faster when parallelized. The second reason is that, even with another layer of abstraction managing your threads, it’s still possible to shoot yourself in the foot-or someplace higher-with parallel processing. PLINQ checks for some unsafe conditions and won’t parallelize a query if it detects those conditions.

    I’ll be pointing out some of the problems and conditions that PLINQ won’t detect but, in the end, it’s your responsibility to only use PLINQ where it won’t generate those untraceable bugs.



    Processing PLINQ
    Invoking PLINQ is easy: just add the AsParallel extension to your data source. This is an example from an application that joins a local version of the Northwind database to the remote version to get Orders based on customer information:

    Dim ords As System.Linq.ParallelQuery(Of ParallelExtensions.Order)

    ords = From c In le.Customers.AsParallel Join o In re.Orders.AsParallel
    On c.CustomerID Equals o.CustomerID
    Where c.CustomerID = "ALFKI"

    Select o

    Because both data sources are marked AsParallel (and, in Join, if one data source is AsParallel, both must be) PLINQ will be used.

    As with ordinary LINQ queries, PLINQ queries use deferred processing: Data isn’t retrieved until you actually handle it. That means while the LINQ query has been declared as parallel, parallel processing doesn’t occur until you process the results. So parallel execution doesn’t actually occur until the following block of code, which processes the due date on each of the retrieved Order objects:

    For Each ord As Order In ords
    ord.RequiredDate.Value.AddDays(2)
    Next

    Under the hood, PLINQ will use one thread to execute the code in the For...Each loop, while other threads may be used to run the components of the query on as many processors as are available, up to a maximum of 64.

    If the processing that I want to perform on each Order doesn’t share a state with the processing on other Orders, I can further improve responsiveness by using a ForAll loop. The ForAll is a method available from collections produced by a PLINQ query that accepts a lambda expression. Unlike a For...Each loop that executes on the application’s main thread, the operation passed to the ForAll method executes on the individual query threads generated by the PLINQ query:

    ords.ForAll(Sub(ord)
    ord.RequiredDate.Value.AddDays(2)
    End Sub)

    Unlike my For...Each loop, which executes sequentially on a thread of its own, the code in my ForAll processing executes in parallel on the threads that are retrieving the Orders.



    Managing Order
    As with SQL-though everyone forgets it-order is not guaranteed in PLINQ. The order that results are returned in by PLINQ subqueries will depend on the unpredictable response time of the various threads. This query, for instance, is intended to retrieve the next five Orders to be shipped:

    ords = From o In re.Orders.AsParallel
    Where o.RequiredDate > Now
    Select o
    Take (5)

    If I don’t guarantee order, I’m going to get a random collection of Orders with required dates later than the current time-I may or may not get the first five Orders. To ensure that I’ll get the first five for both SQL and PLINQ, I need to add an Order By clause to the query that sorts the dates in ascending order. And, yes, that will throw away some of the benefits of PLINQ. Because results returned from multiple threads will turn up unexpectedly, PLINQ doesn’t really understand the concept of "previous item" and " next item." If, in your loop, you use the values of one item to process the next item in the loop, you may be introducing an error into your processing. To have items processed in the order that they appeared in the original data source, you’ll need to add the AsOrdered extension to the query.

    For instance, if I wanted to "batch" my Orders into groups that were below a certain freight charge, I might write a loop like this:

    For Each ord As Order In ords
    totFreight += ord.Freight
    If totFreight > FreightChargeLimit Then
    Exit For
    End If
    shipOrders.Add(ord)
    Next

    Because of the unpredictable order that items will be returned from parallel processes, I can’t guarantee that I’m putting anything but random Orders in each batch. To guarantee that items are processed in the order they appeared in my original data source, I have to add the AsOrdered extension to my data source:

    ords = From o In re.Orders.AsParallel.AsOrdered
    Where o.RequiredDate > Now
    Select o

    Source of Information : Visual Studio Magazine August 2010


0 comments:

Leave a Reply