Method of generating random numbers that sum to 100 - is this truly random?

No, this is not a good approach - half the time, the first element will be $50$ or more, which is way too often. Essentially, the odds that the first element is $100$ should not be the same as the odds that the first elements is $10$. There is only one way for $a=100$, but there are loads of ways for $a=10$.

The number of such sums $100=a+b+c+d$ with $a,b,c,d\geq 0$ integers, is: $\binom{100+3}{3}$. If your algorithm doesn't randomly choose from $1$ to some multiple of $103$, you can't get an even probability.

An ideal approach. Let pick a number $x_1$ from $1$ to $103$. Then pick a different number $x_2\neq x_1$ from $1$ to $103$, then pick a third number $x_3\neq x_1,x_2$ from $1$ to $103$.

Then sort these values, so that $x_1<x_2<x_3$. Then set $$a=x_1-1, b=x_2-x_1-1, c=x_3-x_2-1, d=103-x_3.$$

Generate four random numbers between $0$ and $1$

Add these four numbers; then divide each of the four numbers by the sum, multiply by $100$, and round to the nearest integer.

Check that the four integers add to $100$ (they will, two thirds of the time). If they don't (rounding errors), try again...

Your question mention an inefficient algorithm generating four independent and uniformly distributed numbers among the integers from 0 to 100 and repeating until their sum is 100. I'll assume you are satisfied with the distribution generated by that algorithm, but you are not satisfied with the performance.

Before looking into how to produce the distribution more efficiently, one first has to understand what the distribution looks like.

By construction it is easy to see that each of $a$, $b$, $c$, and $d$ are identically distributed. It is also easy to see that they are not independent due to their sum being constant. What we already know about their distribution is that it has minimum value 0, maximum value 100, and average value 25. The average follows from the fact that their sum has to be 100 on average.

This rules out a uniform distribution of the individual numbers (and in fact it rules out every symmetrical distribution). This means your more efficient algorithm, which generates $a$ uniformly will produce a different distribution.

Towards an efficient algorithm

If we define $X = a+b$ and ask what the distribution of $X$ looks like, we will find something interesting. The distribution clearly doesn't depend on which pair of the four numbers we summed. So all six possible pairs are identically distributed, but not independent. This distribution has minimum 0, maximum 100, and average 50. And the distribution has to be symmetrical because $X$ and $100-X$ are identically distributed.

It is not immediately obvious if the distribution of $X$ is uniform across the integers form 0 to 100. However if the distribution of $X$ can be generated efficiently, then the distribution of all four numbers can be generated efficiently as follows:

Generate $X$
Choose $a$ uniformly random in the range $0$ to $X$
Let $b := X-a$
Choose $c$ uniformly random in the range $0$ to $100-X$
Let $d := 100-X-b$

The distribution of X

The original algorithm would produce $X$ as the sum of two uniformly random numbers in the range $0$ to $100$, but discard any results where the overall sum was different form $100$.

A different algorithm could generate $X$ and $Y$ according to this distribution and discard the result if $X+Y \neq 100$. This is useful because the generation of $X$ and $Y$ can be simplified.

If $X$ is larger than 100 it can be discarded immediately. We easily analyze what the new distribution before we verify the sum of $X$ and $Y$ will be. The initial probability of an outcome $x \in [0;100]$ would be $\frac{1+x}{10000}$, but when we discard values larger than 100, the probability will be $\frac{1+x}{5050}$.

The probability of immediately generating $X=x$ and $Y=100-x$ can then be computed as $\frac{1+x}{5050} \cdot \frac{1+(100-x)}{5050} = \frac{(1+x)(101-x)}{5050^2}$ The probability of $P(X=x \wedge Y=100-x)$ can then be computed by simply scaling the denominator such that the sum will be $1$

At this point it is clear that $X$ isn't uniformly distributed. But it also gives us a way to construct $X$ directly.

In order to generate the distribution of $X$ directly, we need a formula for $P(X \leq x)$. This formula will be:

$$P(X \leq x) = \frac{\Sigma_{i=0}^x (1+x)(101-x)}k = \frac{-2x^3 + 297x^2 + 905x + 606}{6k}$$

Because we know that $P(X \leq 100) = 1$, we can deduce that $k=176851$.

With this the algorithm becomes:

Choose $r$ uniformly random from the integers $[0;176850]$
Take smallest $x$ such that $\Sigma_{i=0}^x (1+x)(101-x) \geq r$
Choose $a$ uniformly random in the range $0$ to $x$
Let $b := x-a$
Choose $c$ uniformly random in the range $0$ to $100-x$
Let $d := 100-x-b$

Method of generating random numbers that sum to 100 - is this truly random?

Tags:

Random

Probability

Related

Recent Posts