Anyway to Parallel Yield c#

The trouble with the code change is that it waits till all flat file queries are enumerated before it returns the whole lot that can then be enumerated and yielded.

Let's prove that it's false by a simple example. First, let's create a TestQuery class that will yield a single entity after a given time. Second, let's execute several test queries in parallel and measure how long it took to yield their result.

public class TestQuery : IFlatFileQuery {

    private readonly int _sleepTime;

    public IEnumerable<Entity> Run() {
        Thread.Sleep(_sleepTime);
        return new[] { new Entity() };
    }

    public TestQuery(int sleepTime) {
        _sleepTime = sleepTime;
    }

}

internal static class Program {

    private static void Main() {
        Stopwatch stopwatch = Stopwatch.StartNew();
        var queries = new IFlatFileQuery[] {
            new TestQuery(2000),
            new TestQuery(3000),
            new TestQuery(1000)
        };
        foreach (var entity in queries.AsParallel().SelectMany(ffq => ffq.Run()))
            Console.WriteLine("Yielded after {0:N0} seconds", stopwatch.Elapsed.TotalSeconds);
        Console.ReadKey();
    }

}

This code prints:

Yielded after 1 seconds
Yielded after 2 seconds
Yielded after 3 seconds

You can see with this output that AsParallel() will yield each result as soon as its available, so everything works fine. Note that you might get different timings depending on the degree of parallelism (such as "2s, 5s, 6s" with a degree of parallelism of 1, effectively making the whole operation not parallel at all). This output comes from an 4-cores machine.

Your long processing will probably scale with the number of cores, if there is no common bottleneck between the threads (such as a shared locked resource). You might want to profile your algorithm to see if there are slow parts that can be improved using tools such as dotTrace.


I don't think there is a red flag in your code anywhere. There are no outrageous inefficiencies. I think it comes down to multiple smaller differences.

PLINQ is very good at processing streams of data. Internally, it works more efficiently than adding items to a synchronized list one-by-one. I suspect that your calls to TryAdd are a bottleneck because each call requires at least two Interlocked operations internally. Those can put enormous load on the inter-processor memory bus because all threads will compete for the same cache line.

PLINQ is cheaper because internally, it does some buffering. I'm sure it doesn't output items one-by-one. Probably it batches them and amortizes sycnhronization cost that way over multiple items.

A second issue would be the bounded capacity of the BlockingCollection. 100 is not a lot. This might lead to a lot of waiting. Waiting is costly because it requires a call to the kernel and a context switch.


I make this alternative that works good for me in any scenario:

This works for me:

  • In a Task in a Parallel.Foreach Enqueue in a ConcurrentQueue the item transformed to be processed.
  • The task has a continue that marks a flag with that task ends.
  • In the same thread of execution with tasks ends a while dequeue and yields

Fast and excellent results for me:

Task.Factory.StartNew (() =>
{
    Parallel.ForEach<string> (TextHelper.ReadLines(FileName), ProcessHelper.DefaultParallelOptions,
    (string currentLine) =>
    {
        // Read line, validate and enqeue to an instance of FileLineData (custom class)
    });
}).
ContinueWith 
(
    ic => isCompleted = true 
);


while (!isCompleted || qlines.Count > 0)
{
    if (qlines.TryDequeue (out returnLine))
    {
        yield return returnLine;
    }
}