Why are ToLookup and GroupBy different?

In simple LINQ-world words:

  • ToLookup() - immediate execution
  • GroupBy() - deferred execution

why would I ever bother with GroupBy? Why should it exist?

What happens when you call ToLookup on an object representing a remote database table with a billion rows in it?

The billion rows are sent over the wire, and you build the lookup table locally.

What happens when you call GroupBy on such an object?

A query object is built; end of story.

When that query object is enumerated then the analysis of the table is done on the database server and the grouped results are sent back on demand a few at a time.

Logically they are the same thing but the performance implications of each are completely different. Calling ToLookup means I want a cache of the entire thing right now organized by group. Calling GroupBy means "I am building an object to represent the question 'what would these things look like if I organized them by group?'"


The two are similar, but are used in different scenarios. .ToLookup() returns a ready to use object that already has all the groups (but not the group's content) eagerly loaded. On the other hand, .GroupBy() returns a lazy loaded sequence of groups.

Different LINQ providers may have different behaviors for the eager and lazy loading of the groups. With LINQ-to-Object it probably makes little difference, but with LINQ-to-SQL (or LINQ-to-EF, etc.), the grouping operation is performed on the database server rather than the client, and so you may want to do an additional filtering on the group key (which generates a HAVING clause) and then only get some of the groups instead of all of them. .ToLookup() wouldn't allow for such semantics since all items are eagerly grouped.

Tags:

C#

Linq