What is the difference between the random.choices() and random.sample() functions?
The fundamental difference is that
random.choices() will (eventually) draw elements at the same position (always sample from the entire sequence, so, once drawn, the elements are replaced - with replacement), while
random.sample() will not (once elements are picked, they are removed from the population to sample, so, once drawn the elements are not replaced - without replacement).
Note that here replaced (replacement) should be understood as placed back (placement back) and not as a synonym of substituted (and substitution).
To better understand it, let's consider the following example:
import random random.seed(0) ll = list(range(10)) print(random.sample(ll, 10)) # [6, 9, 0, 2, 4, 3, 5, 1, 8, 7] print(random.choices(ll, k=10)) # [5, 9, 5, 2, 7, 6, 2, 9, 9, 8]
As you can see,
random.sample() does not produce repeating elements, while
In your example, both methods have repeating values because you have repeating values in the original sequence, but, in the case of
random.sample() those repeating values must come from different positions of the original input.
Eventually, you cannot
sample() more than the size of the input sequence, while this is not an issue with
# print(random.sample(ll, 20)) # ValueError: Sample larger than population or is negative print(random.choices(ll, k=20)) # [9, 3, 7, 8, 6, 4, 1, 4, 6, 9, 9, 4, 8, 2, 8, 5, 0, 7, 3, 8]
A more generic and theoretical discussion of the sampling process can be found on Wikipedia.
The basic difference is this:
- Use the
random.samplefunction when you want to choose multiple random items from a list without including the duplicates.
random.choicesfunction when you want to choose multiple items out of a list including repeated.
Here are two examples to demonstrate the difference:
import random alpha_list=['Batman', 'Flash', 'Wonder Woman','Cyborg', 'Superman'] choices=random.choices(alpha_list,k=7) print(choices) sample= random.sample(alpha_list,k=3) print(sample) Output: ['Cyborg', 'Cyborg', 'Wonder Woman', 'Flash', 'Wonder Woman', 'Flash', 'Batman'] ['Superman', 'Flash', 'Batman']
As from the above examples you can see that, in random.choices() you can pass 'k' to be greater than length of your sequence, as random.choices() allow for duplicates.
Whereas, if you were to pass a value of 'k' greater than length of sequence in random.sample() you'll get an error:
Sample larger than population or is negative.
Now, coming to use cases:
random.choices(sequence, weights=None, cum_weights=None, k=1): you would like to use this when you can afford to have duplicates in your sampling. This is the very reason why we can give a value of
random.sample(sequence, k): you would like to use this when you can't afford to have duplicates while sampling your data.
For further reading: