asd

Additionally we'll need extensions to calculate mean, median and average numbers over integer collections.

    public static decimal Median<T>(this IEnumerable<T> xs, Func<T, decimal> f)
    {
        var ys = xs.OrderBy(x => x).Select(f).ToList();

        double mid = (ys.Count - 1) / 2.0;

        return (ys[(int)(mid)] + ys[(int)(mid + 0.5)]) / 2;
    }


    public static decimal Mean<T>(this IEnumerable<T> list, Func<T, decimal> selector)
    {
        return list.Average(selector);
    }


    public static IEnumerable<decimal> Modes<T>(this IEnumerable<T> list, Func<T, decimal> f)
    {
        var modesList = list
            .GroupBy(f)
            .Select(grp => new
            {
                Value = grp.Key,
                Occurrences = grp.Count(),
            })
            .ToList();

        int maxOccurrence = modesList.Max(g => g.Occurrences);

        return modesList
            .Where(x => x.Occurrences == maxOccurrence && maxOccurrence > 1) // Thanks Rui!
            .Select(x => x.Value);
    }

Side-note: Contrary to my preference I am supplying the console application project (in Visual Studio 2013 - now available with Professional edition equivalent for free-as-in-beer) with implementation for reader to follow the rest of this post. Full code available for download via Dropbox

Running the entire Main procedure a few times to get the memory and caches "warmed up" and eliminate as much ambiguity due to loading. On the "first" run over a 1000 operations the output is predicatable. Given the miniscule latencies in nanoseconds of clock cycles as well as L1 and L2 cache references, the sequential operations run two orders of magnitude better than the concucrrent and batched counterparts. We can see this with the time to execute (TTE) of linear on average is 0.2ms with 0.4, 0.2 and 0.0 being the most recurring numbers (modes). This means that a good portion of the operations don't even register on the milisecond scale, probably runnig well under micro-seconds too. The total time to execute the same 1000 operations in 10 * 10 loop (i.e. 100,000 operations in total) is 24ms. Concurrent times are double an order of magnitude worse (on average and in total). In total it takes concurrently batched operation 1.2 seconds to execute 100,000 operations. So hands down winner - sequentially linear execution, no doubt. In fact, if we continue increasing the number of operation to 1,000,000, 10,000,000 etc. the numbers look predictably the same. Linear wins hands down every time.

(space delimited) total items[, magnitude[,delay (in ms)]]>1000
args:1000 | 2^2 * 8 => 32 | 0
Linear
 TTE (in ms):
avg 0.2, median 0.2, modes {0.4, 0.2, 0.0}


Linear Ttl: >24 ms

Concurrent
 TTE (in ms):
avg 11.9, median 11.5, modes {11.3, 11.4, 11.5}


Concurrent Ttl: >1190 ms
-------------------------------------------------------------------------
Run again? (cls + 'Return' clears screen; leave blank + 'Return' to quit)
(space delimited) total items[, magnitude[,delay (in ms)]]>10000
args:10000 | 2^2 * 8 => 32 | 0
Linear
 TTE (in ms):
avg 2.0, median 1.8, modes {1.7}


Linear Ttl: >197 ms

Concurrent
 TTE (in ms):
avg 140.5, median 141.8, modes {}


Concurrent Ttl: >14053 ms
-------------------------------------------------------------------------
Run again? (cls + 'Return' clears screen; leave blank + 'Return' to quit)
(space delimited) total items[, magnitude[,delay (in ms)]]>100000
args:100000 | 2^2 * 8 => 32 | 0
Linear
 TTE (in ms):
avg 15.0, median 14.3, modes {12.7}


Linear Ttl: >1504 ms

Concurrent
 TTE (in ms):
avg 1718.1, median 1701.6, modes {}


Concurrent Ttl: >171814 ms

So far we have seen that this entire concurrent thing has no place when computing with local resources. To see where it does have an impact let's change up the input a bit.

Let's run over a 1000 operations only this time let's create an artifical delay in the operation of 1ms such that the thread is suspended for 1ms in addition to the time it takes to execute everything else.

(space delimited) total items[, magnitude[,delay (in ms)]]>1000 2 1
args:1000 | 2^2 * 8 => 32 | 1
Linear
 TTE (in ms):
avg 2018.3, median 2006.2, modes {2002.0}


Linear Ttl: >201831 ms

Concurrent
 TTE (in ms):
avg 265.5, median 249.0, modes {}


Concurrent Ttl: >26549 ms

Quite a different picture. Just one milisecond delay in the operation over a 1000 operations and the average time for linear is 2 seconds where as concurrent is 265ms. That's an order of magnitude better for concurrent if you're keping score. The total time to execute 10 * 10 loops of 1000 operations (i.e. 100,000 operations in total) linearly is over 3 minutes and concurrently is under half a minute. If we start increasing latency times to 2 or even 20ms we'll have to drop the number of operations to 100 because linear time takes too long for my patience.

(space delimited) total items[, magnitude[,delay (in ms)]]>100 2 2
args:100 | 2^2 * 8 => 32 | 2
Linear
 TTE (in ms):
avg 301.1, median 301.0, modes {301.0}


Linear Ttl: >30113 ms

Concurrent
 TTE (in ms):
avg 13.6, median 11.8, modes {11.6}


Concurrent Ttl: >1355 ms
------------------------------------------------------------------------
Run again? (cls + 'Return' clears screen; leave blank + 'Return' to quit)
(space delimited) total items[, magnitude[,delay (in ms)]]>10 2 20
args:10 | 2^2 * 8 => 32 | 20
Linear
 TTE (in ms):
avg 211.7, median 211.2, modes {211.0}


Linear Ttl: >21169 ms

Concurrent
 TTE (in ms):
avg 49.9, median 50.9, modes {}


Concurrent Ttl: >4992 ms

So to answer the original question: are there any performance benefits? You would first have to identify where and why there are latencies to being with.

100 operations with a 2ms delay over 10 * 10 loops (i.e. total of 10,000 operations) takes 300ms on average and the total time is 30 seconds linearly vs 13.6ms on average and total of 1.3 seconds concurrently. Depending on the data set size that's a difference between architecting an entire user process to serve data in real time or schedule it to trigger sending of an email. Even for small sets of 10 operations over 10 * 10 loops (i.e. 1000 in total) a 20ms delay is a difference between 20s in total and 5s in total which frees up server resources quicker, allowing handling of ten timesas many requests per second with same infrastructure.

I would not recommend implementing ForEachConcurrent upfront or in anticipation of (imagined) latencies where there probably aren't any. But post-release of initial versions; when knowledge and log data confirm an understanding that there are indeed and in fact parts of the process where connecting to remote services in the data centre or on the www are taking up time which affects end-users... there and then you might want to consider this approach before rewriting entire flow or upgrading infrastructure.

Applying this technique wisely, strategically and discretely in your existing code base could save you a lot of time and perhaps money. As long as you first have a good grasp on where to apply it and a firm grip on why and how to apply it.

/r/myshit Thread