Faster alternatives to .Distinct()

.Distinct is an O(n) call.
You can't get any faster than that.

However, you should make sure that your GetHashCode (and, to a lesser extent, Equals) is as fast as possible.

Depending on your scenario, you may be able to replace the List<T> with a HashSet<T>, which will prevent duplicates from being inserted in the first place. (yet has O(1) insertion)

However, Always profile your code before jumping to conclusions about what needs to be faster.

If you can do the distinct in place, you can do it very quickly and with zero allocations by first using Array.Sort and then:

 TSource oldV = source[0];
 int pos = 1;
 for (int i = 1; i < source.Count; i++)
 {
     var newV = source[i];
     source[pos] = newV;
     if (!eqComparer.Equals(newV, oldV))
     {
        pos++;
     }                
     oldV = newV;
 }
 //pos now == the new size of the array

You will then have to keep track of the now smaller size of the array, or use Array.resize (But that will allocate a new array)

Alternatively if you do this same approach with a List<T> you can call RemoveRange at the end to resize it without allocating. This ends up being significantly quicker.

Other posters are probably correct though that you can achieve this goal some other way, such as using a hashset in the first place, or keeping parallel collections where one contains only the distinct elements all the time. Offsetting small costs on insert/remove so that no time at all is required to get the distinct set.

Does it have to be a List?

Would it be possible to switch from List, to HashSet? HashSet prevents objects from being inserted into the list more than once in the first place, so the Distinct is already done.

Faster alternatives to .Distinct()

Tags:

C#

.Net

Linq

Related

Recent Posts