Expected number of consecutive guesses to get a given sequence of numbers

You can approach this as a Markov process. You find that the state transition table depends on the structure of the correct solution. To take two extremes, if the solution is $1234$ then your states are

  • Suffix: $\varepsilon$ goes to $1$ with probability $\frac1{10}$ and back to $\varepsilon$ with probability $\frac9{10}$
  • Suffix: $1$ goes to $12$ with probability $\frac1{10}$, to $\varepsilon$ with probability $\frac8{10}$, and back to $1$ with probability $\frac1{10}$.
  • Suffix: $12$ goes to $123$ with probability $\frac1{10}$, to $\varepsilon$ with probability $\frac8{10}$, and to $1$ with probability $\frac1{10}$.
  • Suffix: $123$ goes to $1234$ with probability $\frac1{10}$, to $\varepsilon$ with probability $\frac8{10}$, and to $1$ with probability $\frac1{10}$.
  • Suffix: $1234$ is capturing.

OTOH, if your solution is $1111$ then your states are

  • Suffix: $\varepsilon$ goes to $1$ with probability $\frac1{10}$ and back to $\varepsilon$ with probability $\frac9{10}$
  • Suffix: $1$ goes to $11$ with probability $\frac1{10}$, and to $\varepsilon$ with probability $\frac9{10}$
  • Suffix: $11$ goes to $111$ with probability $\frac1{10}$, and to $\varepsilon$ with probability $\frac9{10}$
  • Suffix: $111$ goes to $1111$ with probability $\frac1{10}$, and to $\varepsilon$ with probability $\frac9{10}$
  • Suffix: $1111$ is capturing.

Clearly the expected length should be longer for the second case than for the first: in both cases you need four consecutive successes, but in the first case a failure from one sequence can be the first success in another sequence.


In light of the comment

We tried using this line of reasoning to calculate the average, but it got way too convoluted.

here's how to do it without getting too convoluted. Take $1234$ as an example. Let $E_S$ denote the expected number of steps from suffix $S$ to the capturing suffix $1234$. The transitions convert directly into simultaneous equations $$\begin{eqnarray}E_\varepsilon &=& 1 + \frac{1}{10} E_1 + \frac{9}{10} E_\varepsilon \\ E_1 &=& 1 + \frac{1}{10} E_{12} + \frac{8}{10} E_\varepsilon + \frac{1}{10} E_1 \\ E_{12} &=& 1 + \frac{1}{10} E_{123} + \frac{8}{10} E_\varepsilon + \frac{1}{10} E_1 \\ E_{123} &=& 1 + \frac{1}{10} E_{1234} + \frac{8}{10} E_\varepsilon + \frac{1}{10} E_1 \\ E_{1234} &=& 0 \end{eqnarray}$$


We can prove the following general result:

Given a code $C$ of $n$ digits, for each $1\le i\le n-1$, let $b_i$ be a number which is $1$ if the first $i$ digits of $C$ equal the last $i$ digits of $C$, and $0$ otherwise. The expected wait time for $C$ is $$10^n+\sum_{i=1}^{n-1}b_i10^i.$$

For example, when $n=4$:

  • The expected wait time for codes like $aaaa$ is $11,110$.
  • The expected wait time for codes like $abab$ is $10,100$.
  • The expected wait time for codes like $abca$ is $10,010$.
  • The expected wait time for everything else is $10,000$.

To prove this, let us first assume that $b_i=0$ for all $i$, meaning no prefix of $C$ is also a suffix.

Imagine a casino with a ten digit roulette wheel. It spins this wheel once per minute, except that the casino shuts down once the code $C$ appears over the course of $n$ consecutive spins. Players may place an $\$x$ bet on the outcome of the spin; if they are wrong, they lost $\$x$, and if they are right, they win $\$9x$, so the bet is fair.

Imagine that every minute, a new person enters the casino. They first place a $\$1$ bet on the first digit of $C$. If they win, they place a $\$10$ bet on the second digit of $C$, and in general people who have won $k$ times place a $10^k$ bet on the $(k+1)^{st}$ digit of $C$. Note that anyone who does not make it to the end of $C$ will lose exactly $\$1$; for example, if they make it to digit two then lose, their net winnings are $+9+90-100=-1$. Only a person who makes it all the way through to the end of $C$ will win big, a total of $10^n-1$. This can only happen to one person, because we stipulated the casino shuts down once $C$ appears in order.

Since all of these bets are fair, the total expected winnings of all the players is $0$. On the other hand, letting $T$ be the total number of spins, the actual winnings are $10^n-T$, since the first $T-1$ people lose $1$ and the last person wins $10^n-1$. Equating these two, we get that the expected number of spins is $10^n$.

The full result comes from noting that when some of the $b_i$ are nonzero, then there are actually a couple more winners at the end of the game. Namely, the $i^{th}$ player from the end wins $10^i-1$ as long as the first $i$ digits of $C$ are equal to the last $i$ digits.