Why am I getting unevenly-spread results when using $RANDOM?

To expand on the topic of modulo bias, your formula is:

max=$((6*3600))
$(($RANDOM%max/3600))

And in this formula, $RANDOM is a random value in the range 0-32767.

   RANDOM Each time this parameter is referenced, a random integer between
          0 and 32767 is generated.

It helps to visualize how this maps to possible values:

0 = 0-3599
1 = 3600-7199
2 = 7200-10799
3 = 10800-14399
4 = 14400-17999
5 = 18000-21599
0 = 21600-25199
1 = 25200-28799
2 = 28800-32399
3 = 32400-32767

So in your formula, the probability for 0, 1, 2 is twice that of 4, 5. And probability of 3 is slightly higher than 4, 5 too. Hence your result with 0, 1, 2 as winners and 4, 5 as losers.

When changing to 9*3600, it turns out as:

0 = 0-3599
1 = 3600-7199
2 = 7200-10799
3 = 10800-14399
4 = 14400-17999
5 = 18000-21599
6 = 21600-25199
7 = 25200-28799
8 = 28800-32399
0 = 32400-32767

1-8 have the same probability, but there is still a slight bias for 0, and hence 0 was still the winner in your test with 100'000 iterations.

To fix the modulo bias, you should first simplify the formula (if you only want 0-5 then the modulo is 6, not 3600 or even crazier number, no sense in that). This simplification alone will reduce your bias by a lot (32766 maps to 0, 32767 to 1 giving a tiny bias to those two numbers).

To get rid of bias altogether, you need to re-roll, (for example) when $RANDOM is lower than 32768 % 6 (eliminate the states that do not map perfectly to available random range).

max=6
for f in {1..100000}
do
    r=$RANDOM
    while [ $r -lt $((32768 % $max)) ]; do r=$RANDOM; done
    echo $(($r%max))
done | sort | uniq -c | sort -n

Test result:

  16425 5
  16515 1
  16720 0
  16769 2
  16776 4
  16795 3

The alternative would be using a different random source that does not have noticable bias (orders of magnitude larger than just 32768 possible values). But implementing a re-roll logic anyway doesn't hurt (even if it likely never comes to pass).


This is modulo bias. If RANDOM is well constructed, each value between 0 and 32767 is produced with equal probability. When you use modulo, you change the probabilities: the probabilities of all the values above the modulo are added to the values they map to.

In your example, 6×3600 is approximately two thirds of the range of values. The probabilities of the top third are therefore added to those of the bottom third, which means that values from 0 to 2 (approximately) are twice as likely to be produced as values from 3 to 5. 9×3600 is nearly 32767, so the modulo bias is much smaller and only affects values from 32400 to 32767.

To answer your main question, at least in Bash the random sequence is fully predictable if you know the seed. See intrand32 in variables.c.

Tags:

Random