Parallel.ForEach vs Task.Factory.StartNew

The first is a much better option.

Parallel.ForEach, internally, uses a Partitioner<T> to distribute your collection into work items. It will not do one task per item, but rather batch this to lower the overhead involved.

The second option will schedule a single Task per item in your collection. While the results will be (nearly) the same, this will introduce far more overhead than necessary, especially for large collections, and cause the overall runtimes to be slower.

FYI - The Partitioner used can be controlled by using the appropriate overloads to Parallel.ForEach, if so desired. For details, see Custom Partitioners on MSDN.

The main difference, at runtime, is the second will act asynchronous. This can be duplicated using Parallel.ForEach by doing:

Task.Factory.StartNew( () => Parallel.ForEach<Item>(items, item => DoSomething(item)));

By doing this, you still take advantage of the partitioners, but don't block until the operation is complete.


Parallel.ForEach will optimize(may not even start new threads) and block until the loop is finished, and Task.Factory will explicitly create a new task instance for each item, and return before they are finished (asynchronous tasks). Parallel.Foreach is much more efficient.


I did a small experiment of running a method "1,000,000,000 (one billion)" times with "Parallel.For" and one with "Task" objects.

I measured the processor time and found Parallel more efficient. Parallel.For divides your task in to small work items and executes them on all the cores parallely in a optimal way. While creating lot of task objects ( FYI TPL will use thread pooling internally) will move every execution on each task creating more stress in the box which is evident from the experiment below.

I have also created a small video which explains basic TPL and also demonstrated how Parallel.For utilizes your core more efficiently http://www.youtube.com/watch?v=No7QqSc5cl8 as compared to normal tasks and threads.

Experiment 1

Parallel.For(0, 1000000000, x => Method1());

Experiment 2

for (int i = 0; i < 1000000000; i++)
{
    Task o = new Task(Method1);
    o.Start();
}

Processor time comparison