c# generic, covering both arrays and lists?

In fact the most appropriate common interface between T[] and List<T> for your case is IReadOnlyList<T>

public static T AnyOne<T>(this IReadOnlyList<T> list) where T:class
{
    int k = list.Count;
    int r = Random.Range(0,k);
    return list[r];
}

As mentioned in another answer, IList<T> also works, but the good practice requires you to request from the caller the minimum functionality needed by the method, which in this case is Count property and read only indexer.

IEnumerable<T> also works, but it allows the caller to pass a non collection iterator where Count and ElementAt extension methods could be highly inefficient - like Enumerable.Range(0, 1000000), database query etc.

2020, quick for Unity3D programmers: of course, nowadays modern versions of .Net are available in Unity!

enter image description here

T[] and List<T> actually both implement IList<T>, which provides enumeration, a Count property and an indexer.

public static T AnyOne<T>(this IList<T> ra) 
{
    int k = ra.Count;
    int r = Random.Range(0,k);
    return ra[r];
}

Historical note: in past decades, this was the correct and only solution for Unity3D specifically, as in the Olden Days modern .Net was not available in Unity.

It's interesting how some people choose IEnumerable<T>, while some other people insist on IReadOnlyList<T>.

Now let's be honest. IEnumerable<T> is useful, very useful. In most cases you just want to put this method in some library, and throw your utility function to whatever you think is a collection, and be done with it. However, using IEnumerable<T> correctly is a bit tricky, as I'll point out here...

IEnumerable

Let's for a second assume that the OP is using Linq and wants to get a random element from a sequence. Basically he ends up with the code from @Yannick, that ends up in the library of utility helper functions:

public static T AnyOne<T>(this IEnumerable<T> source)
{
    int endExclusive = source.Count(); // #1
    int randomIndex = Random.Range(0, endExclusive); 
    return source.ElementAt(randomIndex); // #2
}

Now, what this basically does is 2 things:

Count the number of elements in the source. If the source is a simple IEnumerable<T> this implies going through all the elements in the list, if it's f.ex. a List<T>, it will use the Count property.
Reset the enumerable, go to element randomIndex, grab it and return it.

There are two things that can go wrong here. First of all, your IEnumerable might be a slow, sequential storage, and doing Count can ruin the performance of your application in an unexpected way. For example, streaming from a device might get you into trouble. That said, you could very well argue that's to be expected when that's inherent to the characteristic of the collection - and personally I'd say that argument will hold.

Secondly -and this is perhaps even more important- there's no guarantee that you enumerable will return the same sequence every iteration (and therefore there's also no guarantee that your code won't crash). For example, consider this innocent looking piece of code, that might be useful for testing purposes:

IEnumerable<int> GenerateRandomDataset()
{
    Random rnd = new Random();
    int count = rnd.Next(10, 100); // randomize number of elements
    for (int i=0; i<count; ++i)
    {
        yield return new rnd.Next(0, 1000000); // randomize result
    }
}

The first iteration (calling Count()), you might generate 99 results. You pick element 98. Next you call ElementAt, the second iteration generates 12 results and your application crashes. Not cool.

Fixing the IEnumerable implementation

As we've seen, the issue of the IEnumerable<T> implementation is that you have to go through the data 2 times. We can fix that by going through the data a single time.

The 'trick' here is actually pretty simple: if we have seen 1 element, we definitely want to consider returning that. All elements considered, there's a 50%/50% chance that this is the element we would have returned. If we see the third element, there's a 33%/33%/33% chance that we would have returned this. And so on.

Therefore, a better implementation might be this one:

public static T AnyOne<T>(this IEnumerable<T> source)
{
    Random rnd = new Random();
    double count = 1;
    T result = default(T);
    foreach (var element in source)
    {
        if (rnd.NextDouble() <= (1.0 / count)) 
        {
            result = element;
        }
        ++count;
    }
    return result;
}

On a side note: if we're using Linq, we would expect operations to use the IEnumerable<T> once (and only once!). Now you know why.

Making it work with lists and arrays

While this is a neat trick, our performance will now be slower if we work on a List<T>, which doesn't make any sense because we know there's a much better implementation available due the the property that indexing and Count are available to us.

What we're looking for is the common denominator for this better solution, that's used in as many collections as we can find. The thing we'll end up with is the IReadOnlyList<T> interface, that implements everything we need.

Because of the properties that we know to be true for IReadOnlyList<T>, we can now safely use Count and indexing, without running the risk of crashing the application.

However, while IReadOnlyList<T> seems appealing, IList<T> for some reason doesn't seem to implement it... which basically means that IReadOnlyList<T> is a bit of a gamble in practice. In that respect, I'm pretty sure there are a lot more IList<T> implementations out there than IReadOnlyList<T> implementations. It therefore seems best to simply support both interfaces.

This leads us to the solution here:

public static T AnyOne<T>(this IEnumerable<T> source)
{
    var rnd = new Random();
    var list = source as IReadOnlyList<T>;
    if (list != null)
    {
        int index = rnd.Next(0, list.Count);
        return list[index];
    }

    var list2 = source as IList<T>;
    if (list2 != null)
    {
        int index = rnd.Next(0, list2.Count);
        return list2[index];
    }
    else
    {
        double count = 1;
        T result = default(T);
        foreach (var element in source)
        {
            if (rnd.NextDouble() <= (1.0 / count))
            {
                result = element;
            }
            ++count;
        }
        return result;
    }
}

PS: For more complex scenario's, check out the Strategy Pattern.

Random

@Yannick Motton made the remark that you have to be careful with Random, because it won't be really random if you call methods like this a lot of times. Random is initialized with the RTC, so if you make a new instance a lot of times, it won't change the seed.

A simple way around this is as follows:

private static int seed = 12873; // some number or a timestamp.

// ...

// initialize random number generator:
Random rnd = new Random(Interlocked.Increment(ref seed));

This way, every time you call AnyOne, the random number generator will receive another seed and it will work even in tight loops.

To summarize:

So, to summarize it:

IEnumerable<T>'s should be iterated once, and only once. Doing otherwise might give the user unexpected results.
If you have access to better capabilities than simple enumeration, it's not necessary to go through all the elements. Best to grab the right result right away.
Consider what interfaces you're checking very carefully. While IReadOnlyList<T> is definitely the best candidate, it's not inherited from IList<T> which means it'll be less effective in practice.

The end result is something that Just Works.

c# generic, covering both arrays and lists?

Tags:

C#

Arrays

Generics

Unity3D

Related

Recent Posts