# Efficiently generating unique pairs of integers

Easy, peasy, when viewed in the proper way.

You wish to generate n pairs of integers, [p,q], such that p and q lie in the interval [1,m], and p

How many possible pairs are there? The total number of pairs is just m*(m-1)/2. (I.e., the sum of the numbers from 1 to m-1.)

So we could generate n random integers in the range [1,m*(m-1)/2]. Randperm does this nicely. (Older matlab releases do not allow the second argument to randperm.)

k = randperm(m/2*(m-1),n);


(Note that I've written this expression with m in a funny way, dividing by 2 in perhaps a strange place. This avoids precision problems for some values of m near the upper limits.)

Now, if we associate each possible pair [p,q] with one of the integers in k, we can work backwards, from the integers generated in k, to a pair [p,q]. Thus the first few pairs in that list are:

{[1,2], [1,3], [2,3], [1,4], [2,4], [3,4], ..., [m-1,m]}


We can think of them as the elements in a strictly upper triangular array of size m by m, thus those elements above the main diagonal.

q = floor(sqrt(8*(k-1) + 1)/2 + 1/2);
p = k - q.*(q-1)/2;


See that these formulas recover p and q from the unrolled elements in k. We can convince ourselves that this does indeed work, but perhaps a simple way here is just this test:

k = 1:21;
q = floor(sqrt(8*(k-1) + 1)/2 + 3/2);
p = k - (q-1).*(q-2)/2;
[k;p;q]'

ans =
1     1     2
2     1     3
3     2     3
4     1     4
5     2     4
6     3     4
7     1     5
8     2     5
9     3     5
10     4     5
11     1     6
12     2     6
13     3     6
14     4     6
15     5     6
16     1     7
17     2     7
18     3     7
19     4     7
20     5     7
21     6     7


Another way of testing it is to show that all pairs get generated for a small case.

m = 5;
n = 10;
k = randperm(m/2*(m-1),n);
q = floor(sqrt(8*(k-1) + 1)/2 + 3/2);
p = k - (q-1).*(q-2)/2;

sortrows([p;q]',[2 1])
ans =
1     2
1     3
2     3
1     4
2     4
3     4
1     5
2     5
3     5
4     5


Yup, it looks like everything works perfectly. Now try it for some large numbers for m and n to test the time used.

tic
m = 1e6;
n = 100000;
k = randperm(m/2*(m-1),n);
q = floor(sqrt(8*(k-1) + 1)/2 + 3/2);
p = k - (q-1).*(q-2)/2;
toc

Elapsed time is 0.014689 seconds.


This scheme will work for m as large as roughly 1e8, before it fails due to precision errors in double precision. The exact limit should be m no larger than 134217728 before m/2*(m-1) exceeds 2^53. A nice feature is that no rejection for repeat pairs need be done.