how to get the index of numpy.random.choice? - python

Regarding your first question, you can work the other way around, randomly choose from the index of the array a and then fetch the value.

>>> a = [1,4,1,3,3,2,1,4]
>>> a = np.array(a)
>>> random.choice(arange(a.size))
6
>>> a[6]

But if you just need random sample without replacement, replace=False will do. Can't remember when it was firstly added to random.choice, might be 1.7.0. So if you are running very old numpy it may not work. Keep in mind the default is replace=True


Here's one way to find out the index of a randomly selected element:

import random # plain random module, not numpy's
random.choice(list(enumerate(a)))[0]
=> 4      # just an example, index is 4

Or you could retrieve the element and the index in a single step:

random.choice(list(enumerate(a)))
=> (1, 4) # just an example, index is 1 and element is 4

This is a bit in left field compared with the other answers, but I thought it might help what it sounds like you're trying to do in a slightly larger sense. You can generate a random sample without replacement by shuffling the indices of the elements in the source array :

source = np.random.randint(0, 100, size=100) # generate a set to sample from
idx = np.arange(len(source))
np.random.shuffle(idx)
subsample = source[idx[:10]]

This will create a sample (here, of size 10) by drawing elements from the source set (here, of size 100) without replacement.

You can interact with the non-selected elements by using the remaining index values, i.e.:

notsampled = source[idx[10:]]

numpy.random.choice(a, size=however_many, replace=False)

If you want a sample without replacement, just ask numpy to make you one. Don't loop and draw items repeatedly. That'll produce bloated code and horrible performance.

Example:

>>> a = numpy.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> numpy.random.choice(a, size=5, replace=False)
array([7, 5, 8, 6, 2])

On a sufficiently recent NumPy (at least 1.17), you should use the new randomness API, which fixes a longstanding performance issue where the old API's replace=False code path unnecessarily generated a complete permutation of the input under the hood:

rng = numpy.random.default_rng()
result = rng.choice(a, size=however_many, replace=False)