How to generate list of unique random floats in Python

If you need to guarantee uniqueness, it may be more efficient to

Try and generate n random floats in [lo, hi] at once.
If the length of the unique floats is not n, try and generate however many floats are still needed

and continue accordingly until you have enough, as opposed to generating them 1-by-1 in a Python level loop checking against a set.

If you can afford NumPy doing so with np.random.uniform can be a huge speed-up.

import numpy as np

def gen_uniq_floats(lo, hi, n):
    out = np.empty(n)
    needed = n
    while needed != 0:
        arr = np.random.uniform(lo, hi, needed)
        uniqs = np.setdiff1d(np.unique(arr), out[:n-needed])
        out[n-needed: n-needed+uniqs.size] = uniqs
        needed -= uniqs.size
    np.random.shuffle(out)
    return out.tolist()

If you cannot use NumPy, it still may be more efficient depending on your data needs to apply the same concept of checking for dupes afterwards, maintaining a set.

def no_depend_gen_uniq_floats(lo, hi, n):
    seen = set()
    needed = n
    while needed != 0:
        uniqs = {random.uniform(lo, hi) for _ in range(needed)}
        seen.update(uniqs)
        needed -= len(uniqs)
    return list(seen)

Rough benchmark

Extreme degenerate case

# Mitch's NumPy solution
%timeit gen_uniq_floats(0, 2**-50, 1000)
153 µs ± 3.71 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# Mitch's Python-only solution
%timeit no_depend_gen_uniq_floats(0, 2**-50, 1000)
495 µs ± 43.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# Raymond Hettinger's solution (single number generation)
%timeit sample_floats(0, 2**-50, 1000)
618 µs ± 13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

More "normal" case (with larger sample)

# Mitch's NumPy solution
%timeit gen_uniq_floats(0, 1, 10**5)
15.6 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

# Mitch's Python-only solution
%timeit no_depend_gen_uniq_floats(0, 1, 10**5)
65.7 ms ± 2.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# Raymond Hettinger's solution (single number generation)
%timeit sample_floats(0, 1, 10**5)
78.8 ms ± 4.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

You could just use random.uniform(start, stop). With double precision floats, you can be relatively sure that they are unique if your set is small. If you want to generate a big number of random floats and need to avoid that you have a number twice, check before adding them to the list.

However, if you are looking for a selection of specific numbers, this is not the solution.

Answer

One easy way is to keep a set of all random values seen so far and reselect if there is a repeat:

import random

def sample_floats(low, high, k=1):
    """ Return a k-length list of unique random floats
        in the range of low <= x <= high
    """
    result = []
    seen = set()
    for i in range(k):
        x = random.uniform(low, high)
        while x in seen:
            x = random.uniform(low, high)
        seen.add(x)
        result.append(x)
    return result

Notes

This technique is how Python's own random.sample() is implemented.
The function uses a set to track previous selections because searching a set is O(1) while searching a list is O(n).
Computing the probability of a duplicate selection is equivalent to the famous Birthday Problem.
Given 2**53 distinct possible values from random(), duplicates are infrequent. On average, you can expect a duplicate float at about 120,000,000 samples.

Variant: Limited float range

If the population is limited to just a range of evenly spaced floats, then it is possible to use random.sample() directly. The only requirement is that the population be a Sequence:

from __future__ import division
from collections import Sequence

class FRange(Sequence):
    """ Lazily evaluated floating point range of evenly spaced floats
        (inclusive at both ends)

        >>> list(FRange(low=10, high=20, num_points=5))
        [10.0, 12.5, 15.0, 17.5, 20.0]

    """
    def __init__(self, low, high, num_points):
        self.low = low
        self.high = high
        self.num_points = num_points

    def __len__(self):
        return self.num_points

    def __getitem__(self, index):
        if index < 0:
            index += len(self)
        if index < 0 or index >= len(self):
            raise IndexError('Out of range')
        p = index / (self.num_points - 1)
        return self.low * (1.0 - p) + self.high * p

Here is a example of choosing ten random samples without replacement from a range of 41 evenly spaced floats from 10.0 to 20.0.

>>> import random
>>> random.sample(FRange(low=10.0, high=20.0, num_points=41), k=10)
[13.25, 12.0, 15.25, 18.5, 19.75, 12.25, 15.75, 18.75, 13.0, 17.75]

You can easily use your list of integers to generate floats:

int_list = random.sample(range(1, 100), 10)
float_list = [x/10 for x in int_list]

Check out this Stack Overflow question about generating random floats.

If you want it to work with python2, add this import:

from __future__ import division

How to generate list of unique random floats in Python

Rough benchmark

Answer

Notes

Variant: Limited float range

Tags:

Python

Random

Distinct Values

Related

Recent Posts