Is there a Python equivalent to R's sample() function?

I think numpy.random.choice(a, size=None, replace=True, p=None) may well be what you are looking for.

The p argument corresponds to the prob argument in the sample()function.

In pandas (Python's closest analogue to R) there are the DataFrame.sample and Series.sample methods, which were both introduced in version 0.16.1.

For example:

>>> df = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [6, 7, 8, 9, 0]})
>>> df
   a  b
0  1  6
1  2  7
2  3  8
3  4  9
4  5  0

Sampling 3 rows without replacement:

>>> df.sample(3)
   a  b
4  5  0
1  2  7
3  4  9

Sample 4 rows from column 'a' with replacement, using column 'b' as the corresponding weights for the choices:

>>> df['a'].sample(4, replace=True, weights=df['b'])
3    4
0    1
0    1
2    3

These methods are almost identical to the R function, allowing you to sample a particular number of values - or fraction of values - from your DataFrame/Series, with or without replacement. Note that the prob argument in R's sample() corresponds to weights in the pandas methods.

Is there a Python equivalent to R's sample() function?

Tags:

Python

R

Probability

Related

Recent Posts