Parallel.ForEach stalled when integrated with BlockingCollection

You cannot use Parallel.Foreach() with BlockingCollection.GetConsumingEnumerable(), as you have discovered.

For an explanation, see this blog post:

https://devblogs.microsoft.com/pfxteam/parallelextensionsextras-tour-4-blockingcollectionextensions/

Excerpt from the blog:

BlockingCollection’s GetConsumingEnumerable implementation is using BlockingCollection’s internal synchronization which already supports multiple consumers concurrently, but ForEach doesn’t know that, and its enumerable-partitioning logic also needs to take a lock while accessing the enumerable.

As such, there’s more synchronization here than is actually necessary, resulting in a potentially non-negligable performance hit.

[Also] the partitioning algorithm employed by default by both Parallel.ForEach and PLINQ use chunking in order to minimize synchronization costs: rather than taking the lock once per element, it'll take the lock, grab a group of elements (a chunk), and then release the lock.

While this design can help with overall throughput, for scenarios that are focused more on low latency, that chunking can be prohibitive.

That blog also provides the source code for a method called GetConsumingPartitioner() which you can use to solve the problem.

public static class BlockingCollectionExtensions
{

    public static Partitioner<T> GetConsumingPartitioner<T>(this BlockingCollection<T> collection)
    {
        return new BlockingCollectionPartitioner<T>(collection);
    }


    public class BlockingCollectionPartitioner<T> : Partitioner<T>
    {
        private BlockingCollection<T> _collection;

        internal BlockingCollectionPartitioner(BlockingCollection<T> collection)
        {
            if (collection == null)
                throw new ArgumentNullException("collection");

            _collection = collection;
        }

        public override bool SupportsDynamicPartitions
        {
            get { return true; }
        }

        public override IList<IEnumerator<T>> GetPartitions(int partitionCount)
        {
            if (partitionCount < 1)
                throw new ArgumentOutOfRangeException("partitionCount");

            var dynamicPartitioner = GetDynamicPartitions();
            return Enumerable.Range(0, partitionCount).Select(_ => dynamicPartitioner.GetEnumerator()).ToArray();
        }

        public override IEnumerable<T> GetDynamicPartitions()
        {
            return _collection.GetConsumingEnumerable();
        }

    }
}

The reason for failure is because of the following reason as explained here

The partitioning algorithm employed by default by both Parallel.ForEach and PLINQ use chunking in order to minimize synchronization costs: rather than taking the lock once per element, it'll take the lock, grab a group of elements (a chunk), and then release the lock.

To get it to work, you can add a method on your ParallelConsumer<T> class to indicate that the adding is completed, as below

    public void StopAdding()
    {
        _entries.CompleteAdding();
    }

And now call this method after your for loop , as below

        consumer.Start();
        for (int i = 0; i < itemCount; i++)
        {
            consumer.Enqueue(i);
        }
        consumer.StopAdding();

Otherwise, Parallel.ForEach() would wait for the threshold to be reached so as to grab the chunk and start processing.