How to use GroupBy in an asynchronous manner in EF Core 3.1?

I think the only way you have is just to do it something like this

var blogs = await context.Blogs
    .Where(blog => blog.Url.Contains("dotnet"))
    .ToListAsync();

var groupedBlogs = blogs.GroupBy(t => t.BlobNumber).Select(b => b).ToList();

Because GroupBy will be evaluated at client anyway


This query isn't trying to group data in the SQL/EF Core sense. There are no aggregations involved.

It's loading all detail rows and then batching them into different buckets on the client. EF Core isn't involved in this, this is a purely client-side operation. The equivalent would be :

var blogs = await context.Blogs
    .Where(blog => blog.Url.Contains("dotnet"))
    .ToListAsync();

var blogsByNum = blogs.ToLookup(t => t.BlobNumber);

Speeding up grouping

The batching/grouping/lookup operation is purely CPU bound, so the only way to accelerate it would be to parallelize it, ie use all CPUs to group the data eg :

var blogsByNum = blogs.AsParallel()
                      .ToLookup(t => t.BlobNumber);

ToLookup does more or less that GroupBy().ToList() does - it groups the rows into buckets based on a key

Grouping while loading

A different approach would be to load the results asynchronously and put them into buckets as they arrive. To do that, we need AsAsyncEnumerable(). ToListAsync() returns all the results at once, so it can't be used.

This approach is quite similar to what ToLookup does.


var blogs = await context.Blogs
    .Where(blog => blog.Url.Contains("dotnet"));

var blogsByNum=new Dictionary<string,List<Blog>>();

await foreach(var blog in blogs.AsAsyncEnumerable())
{
    if(blogsByNum.TryGetValue(blog.BlobNumber,out var blogList))
    {
        blogList.Add(blog);
    }
    else
    {
        blogsByNum[blog.BlobNumber=new List<Blog>(100){blog};
    }
}

The query is executed by the call to AsAsyncEnumerable(). The results arrive asynchronously though, so now we can add them to buckets while iterating.

The capacity parameter is used in the list constructor to avoid reallocations of the list's internal buffer.

Using System.LINQ.Async

Things would be a lot easier if we had LINQ operations for IAsyncEnumerable<> itself. This extension namespace provides just that. It's developed by the ReactiveX team. It's available through NuGet and the current major version is 4.0.

With this, we could just write :

var blogs = await context.Blogs
    .Where(blog => blog.Url.Contains("dotnet"));

var blogsByNum=await blogs.AsAsyncEnumerable()   individual rows asynchronously
                          .ToLookupAsync(blog=>blog.BlobNumber);

Or

var blogsByNum=await blogs.AsAsyncEnumerable()   
                          .GroupBy(blog=>blog.BlobNumber)
                          .Select(b=>b)
                          .ToListAsync();