Better way to shuffle two numpy arrays in unison
Your "scary" solution does not appear scary to me. Calling
shuffle() for two sequences of the same length results in the same number of calls to the random number generator, and these are the only "random" elements in the shuffle algorithm. By resetting the state, you ensure that the calls to the random number generator will give the same results in the second call to
shuffle(), so the whole algorithm will generate the same permutation.
If you don't like this, a different solution would be to store your data in one array instead of two right from the beginning, and create two views into this single array simulating the two arrays you have now. You can use the single array for shuffling and the views for all other purposes.
Example: Let's assume the arrays
b look like this:
a = numpy.array([[[ 0., 1., 2.], [ 3., 4., 5.]], [[ 6., 7., 8.], [ 9., 10., 11.]], [[ 12., 13., 14.], [ 15., 16., 17.]]]) b = numpy.array([[ 0., 1.], [ 2., 3.], [ 4., 5.]])
We can now construct a single array containing all the data:
c = numpy.c_[a.reshape(len(a), -1), b.reshape(len(b), -1)] # array([[ 0., 1., 2., 3., 4., 5., 0., 1.], # [ 6., 7., 8., 9., 10., 11., 2., 3.], # [ 12., 13., 14., 15., 16., 17., 4., 5.]])
Now we create views simulating the original
a2 = c[:, :a.size//len(a)].reshape(a.shape) b2 = c[:, a.size//len(a):].reshape(b.shape)
The data of
b2 is shared with
c. To shuffle both arrays simultaneously, use
In production code, you would of course try to avoid creating the original
b at all and right away create
This solution could be adapted to the case that
b have different dtypes.
Your can use NumPy's array indexing:
def unison_shuffled_copies(a, b): assert len(a) == len(b) p = numpy.random.permutation(len(a)) return a[p], b[p]
This will result in creation of separate unison-shuffled arrays.
X = np.array([[1., 0.], [2., 1.], [0., 0.]]) y = np.array([0, 1, 2]) from sklearn.utils import shuffle X, y = shuffle(X, y, random_state=0)
To learn more, see http://scikit-learn.org/stable/modules/generated/sklearn.utils.shuffle.html
Very simple solution:
randomize = np.arange(len(x)) np.random.shuffle(randomize) x = x[randomize] y = y[randomize]
the two arrays x,y are now both randomly shuffled in the same way