Correct way to get a number from 0-9 from a random byte?

There are two generic ways to produce a "sufficiently unbiased" random digit.

First method is to loop if the byte was not in the right range. I.e.:

  • Get next random byte b.
  • If b is in the 0..249 range, return b mod 10.
  • Loop.

This method may consume an unbounded number of random bytes, but it is perfectly unbiased and it is very unlikely to require looping many times. That's what Java's Random.nextInt(int) standard method applies (albeit with 32-bit words instead of bytes).

Second method is to use as source value not one byte but a big enough word. Say, use 20 bytes, interpret these as an integer x in the 0..2160-1 range, and return x mod 10. This way, the bias is still there, but can be made arbitrarily small, to the point that it no longer matters. This is computationally expensive (more than the first method) but has the advantage of always requiring the same number of input bytes, which can be handy in some specific situations (e.g. side-channel leaks).


Splitting byte, for rendering several numbers

As random generator may be something expensive, the more efficient way is to cut each byte (0-255) in two part (0-15), before dropping out of limit values:

There is a little sample using bash:

unset random cnt
while [ ! "$random" ] ;do
    ((cnt++))
    i=$(dd if=/dev/random bs=1 count=1 2>/dev/null | od -A n -t u1)
    for val in $((i&15)) $((i>>4)) ;do
        [ $val -lt 10 ] && [ ! "$random" ] && random=$val
    done
done
printf "%d bytes read, random value: %d\n" $cnt $random

Making a kind of comparission:

As an aswer to @AndreasKrey's comment, there is a little demo, where I try to obtain 10 numbers between 0 an 9:

I use same pot (of random numbers) in both methods:

  • Splitting byte in two parts and filtering bigger number than 9
  • Dropping numbers bigger than 249 and using mod:

.

#!/bin/bash

myRandom() {
    printf ${1+-v} $1 "%s" $(
        head -c1 /dev/urandom | od -A n -t u1
    )
}

num=${1:-10} byteMod=0 byteSplit=0 potMod=() potSplit=() wholePot=()

while [ ${#potMod[@]} -lt $num ] || [ ${#potSplit[@]} -lt $num ];do
    myRandom rndVal
    wholePot+=($rndVal)

    [ ${#potMod[@]}   -lt $num ] && ((byteMod+=1)) &&
        [ $rndVal -lt 250 ] && potMod+=($[rndVal%10])

    [ ${#potSplit[@]} -lt $num ] && ((byteSplit+=1))

    for val in $[rndVal&15] $[rndVal>>4] ;do
        [ $val -lt 10 ] && [ ${#potSplit[@]} -lt $num ] && potSplit+=($val)
      done

  done

printf "%2d bytes was read for rendering %2d values by split * max 10: %s\n" \
    $byteSplit ${#potSplit[@]} "${potSplit[*]}"
printf "%2d bytes was read for rendering %2d values by max 250 && mod: %s\n" \
    $byteMod ${#potMod[@]} "${potMod[*]}"

echo Whole pot: ${wholePot[@]}

This could be run several times:

./randGen10.sh
 6 bytes was read for rendering 10 values by split * max 10: 8 3 9 7 9 3 1 1 3 4
10 bytes was read for rendering 10 values by max 250 && mod: 6 1 7 0 9 0 3 1 3 9
Whole pot: 56 121 57 30 49 20 183 161 123 239

./randGen10.sh
 7 bytes was read for rendering 10 values by split * max 10: 7 1 5 0 7 4 6 9 4 4
10 bytes was read for rendering 10 values by max 250 && mod: 3 3 6 1 8 3 4 0 4 9
Whole pot: 23 213 176 71 198 73 244 220 154 139

./randGen10.sh
10 bytes was read for rendering 10 values by split * max 10: 0 8 3 9 6 6 8 9 2 3
11 bytes was read for rendering 10 values by max 250 && mod: 1 8 2 5 4 8 9 9 0 7
Whole pot: 221 128 254 62 105 214 168 249 189 50 77

./randGen10.sh
 7 bytes was read for rendering 10 values by split * max 10: 3 1 5 9 5 8 6 9 7 7
10 bytes was read for rendering 10 values by max 250 && mod: 9 1 9 8 8 1 7 4 7 6
Whole pot: 19 181 89 168 198 121 247 54 117 226

./randGen10.sh
 9 bytes was read for rendering 10 values by split * max 10: 5 8 6 3 6 8 5 4 0 1
10 bytes was read for rendering 10 values by max 250 && mod: 4 0 0 9 3 4 8 6 6 9
Whole pot: 234 90 200 109 243 214 88 196 16 199

./randGen10.sh
11 bytes was read for rendering 10 values by split * max 10: 3 1 9 5 0 0 6 9 5 5
10 bytes was read for rendering 10 values by max 250 && mod: 5 9 7 0 5 0 1 7 8 9
Whole pot: 175 19 157 90 235 0 191 107 238 89 117

Of course, there is some cases where splited max 10 method do use more bytes than mod max 250...

Explanation:

Surprisingly, having to drop 6/256 values -> 2.34% seem a lot smaller than having to drop 6/16 values -> 37.5%, but as we could obtain a second chance for each number:

  • by splitting, we have 2 x 62.5% => 125% chances to obtain a correct number
  • by using mod (or divide by 25), we only have 97.65% chances...

So, for rendering 100'000 values:

./randGen10.sh 100000 | sed 's/^\(.\{74\}\).*$/\1.../'
 80086 bytes was read for rendering 100000 values by split * max 10: 9 9 2...
102397 bytes was read for rendering 100000 values by max 250 && mod: 3 7 6...
Whole pot: 233 217 46 36 193 182 9 44 187 48 100 172 127 230 157 194 197 1...

This seem correct: 100'000/97.65% = 102'407 and 100'000/1.25=80'000.