Probability of the outcome of this simple card game

A symmetry-based proof that $E[n'] = n$. Also possibly an approach for approximating, but not calculating exactly, $Var(n')$.


Imagine the original $N$ circle cards are in fact $N$ different colors, $c_1, ..., c_N$. The same process is done to the deck, using as many square cards (of the same $N$ colors) as you need.

When the process is finished (i.e. no more circle cards), let random variable $X_j$ be the number of final cards of color $c_j$. Clearly:

  • $E[X_1] = E[X_2] = \dots = E[X_N]$ by symmetry,

  • $X_1 + X_2 + \dots + X_N = N$ by conservation of the total number of cards,

  • $E[X_1] + E[X_2] + \dots + E[X_N] = N$ combining the above and using linearity of expectation,

  • and therefore: $E[X_j] = 1$ for all $j$.

Now imagine you are partially colorblind and cannot distinguish between $c_1, ..., c_n$ and think of all of them as "red". The final no. of "red" cards (to your eyes) will be $n' \equiv X_1 + \dots + X_n,$ and so we have $E[n'] = n$. QED


Further thoughts on $Var(n')$:

Note that each of the $N$ colors can have at most $2$ final cards, i.e. $X_j \in \{0, 1, 2\}$. It might be possible (but difficult/tedious) to calculate $Var(X_j)$, based on careful case analysis, e.g. $X_j = 2$ iff that $c_j$ circle card was picked as the first card of some round $t$, and, neither of the $2$ square cards of color $c_j$ added to the deck was subsequently replaced in rounds $> t$.

If we know $Var(X_j)$, then based on $n' \equiv X_1 + \dots + X_n,$ a decent approximation might be $Var(n') \approx n \times Var(X_j)$. This would be exact equality if the $X_j$'s were independent, but of course they are actually dependent. However, in the limit (or maybe more stringently: the large $n$ but much larger $N$ limit, i.e. $1 \ll n \ll N$), the approximation might be quite good.

The OP said numerically it seems $Var(n') \approx {3n(N-n) \over 4N}$. This would be consistent with my thoughts above if $Var(X_j) = \frac34,$ in the $1 \ll n \ll N$ limit.


It is indeed true that $E[n'] = n$ for any $n \ge 1$.

The idea is to look at the random variable $c_R(t)+s_R(t)$ for each $t$, where $c_R(t)$ is the number of red circles and $s_R(t)$ is the number of red squares after $t$ processes/[plays of the game]. $c_B(t)$ and $s_B(t)$ are defined analogously.

By doing casework on whether a red circle or blue circle was chosen, we have $$E[c_R(t+1)+s_R(t+1)] = $$ $$\frac{c_R(t)}{c_R(t)+c_B(t)}\left[\frac{c_R(t)-1}{N}(c_R(t)-2+s_R(t)+2)+\frac{s_R(t)+1}{N}(c_R(t)-1+S_R(t)+1)+\frac{c_B(t)}{N}(c_R(t)-1+S_R(t)+2)+\frac{s_B(t)}{N}(c_R(t)-1+s_R(t)+2)\right]$$ $$+\frac{c_B(t)}{c_R(t)+c_B(t)}\left[\frac{c_B(t)-1}{N}(c_R(t)+s_R(t))+\frac{s_B(t)+1}{N}(c_R(t)+s_R(t))+\frac{c_R(t)}{N}(c_R(t)+s_R(t)-1)+\frac{s_R(t)}{N}(c_R(t)+s_R(t)-1)\right],$$ which after some computations, involving $c_R(t)+s_R(t)+c_B(t)+s_B(t) \equiv N$, gives $$E[c_R(t+1)+s_R(t+1)] = E[c_R(t)+s_R(t)](1-\frac{1}{N})+E\left[\frac{c_R(t)}{c_R(t)+c_B(t)}\right].$$ Similarly, $$E\left[\frac{c_R(t+1)}{c_R(t+1)+c_B(t+1)}\right] = $$ $$\frac{c_R(t)}{c_R(t)+c_B(t)}\left[\frac{c_R(t)-1}{N}(\frac{c_R(t)-2}{c_R(t)+c_B(t)-2})+\frac{c_B(t)}{N}(\frac{c_R(t)-1}{c_R(t)+c_B(t)-2})+\frac{s_R(t)+1+s_B(t)}{N}(\frac{c_R(t)-1}{c_R(t)+c_B(t)-1})\right]$$ $$+ \frac{c_B(t)}{c_R(t)+c_B(t)}\left[\frac{c_B(t)-1}{N}(\frac{c_R(t)}{c_R(t)+c_B(t)-2})+\frac{c_R(t)}{N}(\frac{c_R(t)-1}{c_R(t)+c_B(t)-2})+\frac{s_R(t)+s_B(t)+1}{N}(\frac{c_R(t)}{c_R(t)+c_B(t)-1})\right],$$ which after a very long computation using also $s_R(t)+s_B(t) \equiv N-c_R(t)-c_B(t)$ gives $$E\left[\frac{c_R(t+1)}{c_R(t+1)+c_B(t+1)}\right] = E\left[\frac{c_R(t)}{c_R(t)+c_B(t)}\right].$$ Since $E[\frac{c_R(0)}{c_R(0)+c_B(0)}] = \frac{n}{N}$, we see that $$E\left[\frac{c_R(t)}{c_R(t)+c_B(t)}\right] = \frac{n}{N}$$ for all $t \ge 0$. Therefore, $$E[c_R(t+1)+s_R(t+1)] = \left(1-\frac{1}{N}\right)E[c_R(t)+s_R(t)]+\frac{n}{N}.$$ Since $E[c_R(0)+s_R(0)] = n$, iteration/induction shows that $$E[c_R(t)+s_R(t)] = n$$ for all $t \ge 0$. In particular, we have $$E[n'] = n,$$ as desired.


Here’s a derivation of the variance.

Let $r(t)$ denote the proportion of circle cards left after time $t$, where a step takes time $\frac1N$. In each step, we remove one circle card, and then another circle card with probability $r$. Thus, in the limit $N\to\infty$, $r$ satisfies the differential equation

$$ r'=-(1+r)\;. $$

The general solution is

$$ r=c\mathrm e^{-t}-1\;, $$

and with $r(0)=1$ we have $c=2$, so

$$ r=2\mathrm e^{-t}-1\;, $$

and thus

$$ t=-\log\frac{r+1}2\;. $$

The process ends at $r=0$, and thus at $t=\log2$, so we expect it to take $N\log2$ steps.

Now focus on some particular circle card. It is replaced by a square card at some $r$ uniformly distributed over $[0,1]$. In the first half-step, it is replaced with probability $\frac1{rN}$, and in the second half-step it is replaced with probability $\frac1N$. Thus, conditional on the circle card being replaced at $r$, the probability that it’s replaced in the first half-step, and thus produces two square cards of its colour, is $\frac1{r+1}$. (The probability that the same card is selected in both half-steps is of order $\frac1N$ and thus negligible in the limit.)

If the circle card is replaced by $2$ square cards, there remain $N\left(\log2+\log\frac{r+1}2\right)=N\log(r+1)$ steps in which these $2$ square cards could be replaced by other square cards. The probability for neither of them to be replaced in any of these steps is

\begin{eqnarray} \left(1-\frac2N\right)^{N\log(r+1)} &\to_{n\to\infty}& \exp(-2\log(r+1)) \\ &=& (r+1)^{-2}\;. \end{eqnarray}

Integrating over the uniform distribution with respect to $r$, we obtain the probability that the circle card leaves behind $2$ square cards (i.e. produces $2$ square cards, neither of which is replaced) as

$$ \int_0^1(r+1)^{-3}\mathrm dr=\frac38\;. $$

As shown in antkam’s answer, the expected number of square cards that a circle card leaves behind is $1$, so the probability that it leaves behind $0$ cards must also be $\frac38$. For the change $D\in\{-1,0,1\}$ in the number of cards of this type, the variance is easily calculated as

$$\mathsf{Var}[D]=\mathsf E[D^2]-\mathsf E[D]^2=\mathsf E[D^2]=\mathsf P(D\ne0)\;,$$

so the variance in the number of cards left behind ($\mathsf{Var}[X_j]$) in antkam’s answer) is $\frac38+\frac38=\frac34$.

Your own answer shows that this implies the result for the total variance that you had obtained numerically. This is the somewhat hand-waving argument I gave for this before you posted your answer: With probability $\frac38+\frac38=\frac34$, the number of cards of the circle card’s colour changes by $\pm1$ with probability $\frac nN$ or $\frac{N-n}N$. The variance of the sum of these $\frac34N$ approximately independent Bernoulli variables is

$$ \frac34N\cdot\frac nN\left(1-\frac nN\right)=\frac{3n(N-n)}{4N}\;, $$

in agreement with your numerical results.

I haven’t made any attempt to rigorously show that the errors all vanish in the limit $N\to\infty$, but this seems very plausible. For instance, the variance of the number of steps should be $O(\sqrt N)$, and $\left(1-\frac2N\right)^\sqrt N\to1$. The correlation among the Bernoulli trials should also go to zero, as it typically does in such scenarios.