Generating a random, non-repeating sequence of all integers in .NET

If you don't need the random numbers to be cryptographically secure, you can use a Linear Congruential Generator.

An LCG is a formula of the form X_n+1 = X_n * a + c (mod m), it needs constant memory and constant time for every generated number.
If proper values for the LCG are chosen, it will have a full period length, meaning it will output every number between 0 and your chosen modulus.

An LCG has a full period if and only if:

  • The modulus and the increment are relatively prime, i.e. GCD(m, c) = 1
  • a - 1 is divisible by all prime factors of m
  • If m is divisible by 4, a - 1 must be divisible by 4.

Our modulus is 2 ^ 32, meaning a must be a number of form 4k + 1 where k is an arbitrary integer, and c must not be divisible by 2.

While this is a C# question I've coded a small C++ program to test the speed of this solution, since I'm more comfortable in that language:

#include <iostream>
#include <stdlib.h>

class lcg {
private:
    unsigned a, c, val;
public:
    lcg(unsigned seed=0) : lcg(seed, rand() * 4 + 1, rand() * 2 + 1) {}
    lcg(unsigned seed, unsigned a, unsigned c) {
        val = seed;
        this->a = a;
        this->c = c;
        std::cout << "Initiated LCG with seed " << seed << "; a = " << a << "; c = " << c << std::endl;
    }

    unsigned next() {
        this->val = a * this->val + c;
        return this->val;
    }
};

int main() {
    srand(time(NULL));
    unsigned seed = rand();
    int dummy = 0;
    lcg gen(seed);
    time_t t = time(NULL);
    for (uint64_t i = 0; i < 0x100000000ULL; i++) {
        if (gen.next() < 1000) dummy++; // Avoid optimizing this out with -O2
    }
    std::cout << "Finished cycling through. Took " << (time(NULL) - t) << " seconds." << std::endl;
    if (dummy > 0) return 0;
    return 1;
}

You may notice I am not using the modulus operation anywhere in the lcg class, that's because we use 32 bit integer overflow for our modulus operation.
This produces all values in the range [0, 4294967295] inclusive.
I also had to add a dummy variable for the compiler not to optimize everything out.
With no optimization this solution finishes in about 15 seconds, while with -O2, a moderate optimization it finishes under 5 seconds.

If "true" randomness is not an issue, this is a very fast solution.


A 32 bit PRP in CTR mode seems like the only viable approach to me (your 4th variant).

You can either

  • Use a dedicated 32 bit block cipher.

    Skip32, the 32 bit variant of Skipjack is a popular choice.

    As a tradeoff between quality/security and performance you can adjust the number of rounds to your needs. More rounds are slower but more secure.

  • Length-preserving-encryption (a special case of format-preserving-encryption)

    FFX mode is the typical recommendation. But in its typical instantiations (e.g. using AES as underlying cipher) it'll be much slower than dedicated 32 bit block ciphers.

Note that many of these constructions have a significant flaw: They're even permutations. That means that once you have seen 2^32-2 outputs, you'll be able to predict the second-to-last output with certainty, instead of only 50%. I think Rogaways AEZ paper mentions a way to fix this flaw.


Is there a way in .NET

Actually, this can be done in most any language

to generate a sequence of all the 32-bit integers (Int32)

Yes.

in random order,

Here we need to agree on terminology since "random" is not what most people think it is. More on this in a moment.

without repetitions,

Yes.

and in a memory-efficient manner?

Yes.

Memory-efficient would mean using a maximum of just a few hundred mega bytes of main memory.

Ok, so would using almost no memory be acceptable? ;-)

Before getting to the suggestion, we need to clear up the matter of "randomness". Something that is truly random has no discernible pattern. Hence, running the algorithm millions of times in a row could theoretically return the same value across all iterations. If you throw in the concept of "must be different from the prior iteration", then it is no longer random. However, looking at all of the requirements together, it seems that all that is really being asked for is "differing patterns of distribution of the integers". And this is doable.

So how to do this efficiently? Make use of Modular multiplicative inverses. I used this to answer the following Question which had a similar requirement to generate non-repeating, pseudo-random, sample data within certain bounds:

Generate different random time in the given interval

I first learned about this concept here ( generate seemingly random unique numeric ID in SQL Server ) and you can use either of the following online calculators to determine your "Integer" and "Modular Multiplicative Inverses (MMI)" values:

  • http://planetcalc.com/3311/
  • http://www.cs.princeton.edu/~dsri/modular-inversion-answer.php

Applying that concept here, you would use Int32.MaxSize as the Modulo value.

This would give a definite appearance of random distribution with no chance for collisions and no memory needed to store already used values.

The only initial problem is that the pattern of distribution is always the same given the same "Integer" and "MMI" values. So, you could come up with differing patterns by either adding a "randomly" generated Int to the starting value (as I believe I did in my answer about generating the sample data in SQL Server) or you can pre-generate several combinations of "Integer" and corresponding "MMI" values, store those in a config file / dictionary, and use a .NET random function to pick one at the start of each run. Even if you store 100 combinations, that is almost no memory use (assuming it is not in a config file). In fact, if storing both as Int and the dictionary uses Int as an index, then 1000 values is approximately 12k?


UPDATE

Notes:

  • There is a pattern in the results, but it is not discernible unless you have enough of them at any given moment to look at in total. For most use-cases, this is acceptable since no recipient of the values would have a large collection of them, or know that they were assigned in sequence without any gaps (and that knowledge is required in order to determine if there is a pattern).
  • Only 1 of the two variable values -- "Integer" and "Modular Multiplicative Inverse (MMI)" -- is needed in the formula for a particular run. Hence:
    • each pair gives two distinct sequences
    • if maintaining a set in memory, only a simple array is needed, and assuming that the array index is merely an offset in memory from the base address of the array, then the memory required should only be 4 bytes * capacity (i.e. 1024 options is only 4k, right?)

Here is some test code. It is written in T-SQL for Microsoft SQL Server since that is where I work primarily, and it also has the advantage of making it real easy-like to test for uniqueness, min and max values, etc, without needing to compile anything. The syntax will work in SQL Server 2008 or newer. For SQL Server 2005, initialization of variables had not been introduced yet so each DECLARE that contains an = would merely need to be separated into the DECLARE by itself and a SET @Variable = ... for however that variable is being initialized. And the SET @Index += 1; would need to become SET @Index = @Index + 1;.

The test code will error if you supply values that produce any duplicates. And the final query indicates if there are any gaps since it can be inferred that if the table variable population did not error (hence no duplicates), and the total number of values is the expected number, then there could only be gaps (i.e. missing values) IF either or both of the actual MIN and MAX values are outside of the expected values.

PLEASE NOTE that this test code does not imply that any of the values are pre-generated or need to be stored. The code only stores the values in order to test for uniqueness and min / max values. In practice, all that is needed is the simple formula, and all that is needed to pass into it is:

  • the capacity (though that could also be hard-coded in this case)
  • the MMI / Integer value
  • the current "index"

So you only need to maintain 2 - 3 simple values.

DECLARE @TotalCapacity INT = 30; -- Modulo; -5 to +4 = 10 OR Int32.MinValue
                                 -- to Int32.MaxValue = (UInt32.MaxValue + 1)
DECLARE @MMI INT = 7; -- Modular Multiplicative Inverse (MMI) or
                      -- Integer (derived from @TotalCapacity)

DECLARE @Offset INT = 0; -- needs to stay at 0 if min and max values are hard-set
-----------
DECLARE @Index INT = (1 + @Offset); -- start

DECLARE @EnsureUnique TABLE ([OrderNum] INT NOT NULL IDENTITY(1, 1),
                             [Value] INT NOT NULL UNIQUE);
SET NOCOUNT ON;

BEGIN TRY
    WHILE (@Index < (@TotalCapacity + 1 + @Offset)) -- range + 1
    BEGIN
        INSERT INTO @EnsureUnique ([Value]) VALUES (
                 ((@Index * @MMI) % @TotalCapacity) - (@TotalCapacity / 2) + @Offset
                                                   );
        SET @Index += 1;
    END;
END TRY
BEGIN CATCH
    DECLARE @Error NVARCHAR(4000) = ERROR_MESSAGE();
    RAISERROR(@Error, 16, 1);
    RETURN;
END CATCH;

SELECT * FROM @EnsureUnique ORDER BY [OrderNum] ASC;
SELECT COUNT(*) AS [TotalValues],
       @TotalCapacity AS [ExpectedCapacity],
       MIN([Value]) AS [MinValue],
       (@TotalCapacity / -2) AS [ExpectedMinValue],
       MAX([Value]) AS [MaxValue],
       (@TotalCapacity / 2) - 1 AS [ExpectedMaxValue]
FROM   @EnsureUnique;