Average number of guesses to guess number between 1 and 1000?

Let us take the more ideal case of guessing integers between $1$ and $2^{10}-1=1111111111_2$ writing in base $2$ from now on without explicitly writing subscript $2$ each time. Then the algorithm for the guesses will be like this:

  1. The initial guess is $1\overline 0$ where $\overline 0$ means fill up with zeros until we have a ten digit number
  2. Write the $n$-th guess as $g_n=x_n1\overline 0$ where $x_n$ is a binary string of length $(n-1)$
  3. If the game responds by "higher", the next guess is $g_{n+1}=x_n11\overline 0$
  4. If the game responds by "lower", the next guess is $g_{n+1}=x_n01\overline 0$
  5. If the game responds by "correct" we are done after $n$ guesses

So we see that the game terminates whenever there are only zeros remaining right of $x_n1$. Then note that there are $2^{n-1}$ binary strings $x_n$ of length $(n-1)$. So $2^{n-1}$ numbers are guessed after exactly $n$ guesses. So for this setup the expected number of guesses will be, returning to writing in base $10$: $$ \frac{1}{2^{10}-1}\sum_{n=1}^{10}n\cdot 2^{n-1}=\frac{9217}{1023}\approx9.00978 $$ So the average case here is very close to the worst case of $10$ guesses. This could seem odd at first, but odd is also the clue here ... Half of the time it takes $10$ guesses to reveal that the randomly picked number $x$ is in fact odd. The first $9$ guesses fills up with zeros which in base $2$ means "divisible by $2$". One quarter of the time $x$ is divisible by $2^2=4$ forcing the algorithm to make $9$ guesses before succes. And so on.


I wrote the following code in Python:

L=[1,1000]
G={}
for n in range(1,11):
    i=0
    while i<len(L)-1:
        guess = int(0.5*(L[i]+L[i+1]+1))
        if guess not in L:
            L = L[:i+1]+[guess]+L[i+1:]
            i+=1
            G[guess] = n
        i+=1
S=0
for g in G:
    S+=G[g]
print (10+S+10)

which responded with $8987$, so the average for the $1$ to $1000$ case, assuming that we always make the guess of the rounded average of the limits of the interval we know that $x$ belongs to, must be $8.987$ guesses.


Let $c$ be the target number somewhere between lower bound $1$ and upper bound $n$, inclusive. We will search at the value $M = \lceil\frac{n}{2}\rceil$. If we're told that the guess is too high, we will set $n=M-1$ and try the search again. Otherwise, we will set the lower bound to $M+1$ and search again. However, what really matters is $c$'s position relative to the bounds, so this is also the same as starting a new search with lower bound $1$ and upper bound $n-M$ with a new target number $c-M$.

This recursive process ends in two possible ways: We either stop when we find the target number, or we stop if the lower and upper bounds are the same (in which case only one search is performed).

Let $F(n)$ represent the expected number of searches needed to find the unknown target number, where $F(0) = 0$ and $F(1) = 1$.

Overall, there is a $\frac{M-1}{n}$ probability that $c < M$, a $\frac{1}{n}$ probability that $c=M$, and a $\frac{n-M}{n}$ probability that $c>M$.

$$F(n) = \left(\frac{M-1}{n}\right)\left(1+F(M-1)\right) + \left(\frac{1}{n}\right)\left(1\right) + \left(\frac{n-M}{n}\right)\left(1+F(n-M)\right)$$

Multiply by $n$, simplify things a bit, and substitute in the full value for $M$:

$$nF(n) = \left(\left\lceil\frac{n}{2}\right\rceil-1\right)F\left(\left\lceil\frac{n}{2}\right\rceil-1\right) + n + \left(n-\left\lceil\frac{n}{2}\right\rceil\right)F\left(n-\left\lceil\frac{n}{2}\right\rceil\right)$$

Let $S(n) = nF(n)$ and $T(n) = S(n) - S(n-1)$, yielding:

$$T(n) = \left(n-\left\lceil\frac{n}{2}\right\rceil\right)F\left(n-\left\lceil\frac{n}{2}\right\rceil\right) - \left(n-\left\lceil\frac{n}{2}\right\rceil-1\right)F\left(n-\left\lceil\frac{n}{2}\right\rceil-1\right) + 1$$

Based on the definitions $S(n)$ and $T(n)$, this is equivalent to the following:

$$T(n) = S\left(n-\left\lceil\frac{n}{2}\right\rceil\right) - S\left(n-\left\lceil\frac{n}{2}\right\rceil-1\right) +1 = T\left(n-\left\lceil\frac{n}{2}\right\rceil\right)+ 1 = T\left(\left\lfloor\frac{n}{2}\right\rfloor\right)+1$$

In other words, $T(n) = \lfloor \log_2(n)\rfloor + 1$. But we want $F(n)$, not just $T(n)$. Thankfully, since $T(n) = S(n) - S(n-1) = nF(n) - (n-1)F(n-1)$, the sum over many $T(k)$ values cancels out intermediate terms, giving us $nF(n)$. After dividing by $n$, we finally isolate $F(n)$:

$$F(n) = \frac{1}{n}\sum_{k=1}^{n} T(k) = 1 + \frac{1}{n}\sum_{k=1}^{n}\lfloor \log_2(k)\rfloor$$

This summation actually has a closed form as well:

$$F(n) = 1 + \frac{1}{n}\left((n+1)\lfloor \log_2(n)\rfloor - 2(2^{\lfloor \log_2(n)\rfloor} - 1)\right)$$

To offer a Python program in the same spirit as String's answer:

from fractions import Fraction
from math import log

def F(n):
    if n==0: return 0
    lg = int(log(n,2))
    return 1 + Fraction(1,n)*((n+1)*lg - 2*(2**lg-1))

For $n=1000$ the result is $\frac{8987}{1000} = 8.987$.