probability distribution of coverage of a set after $X$ independently, randomly selected members of the set

If there are $n$ elements of the set then the probability that $M=m$ have been selected after a sample of $x$ (with replacement) is

$$\frac{S_2(x,m) \; n!}{n^x \; (n-m)!} $$

where $S_2(x,m)$ is a Stirling number of the second kind.

The expected value of $M$ is: $n \left(1- \left(1-\dfrac{1}{n}\right)^x \right)$.

The variance is: $n\left(1-\dfrac{1}{n}\right)^x + n^2 \left(1-\dfrac{1}{n}\right)\left(1-\dfrac{2}{n}\right)^x - n^2\left(1-\dfrac{1}{n}\right)^{2x}. $


The expected proportion of elements covered, $E\left(\frac{m}{n}\right)$, has a simple limiting form as $n \rightarrow \infty$ with the sampling rate $ r / n $ fixed. Note that $\lim_{n \rightarrow \infty} \left(1-\frac{1}{n}\right)^n = e^{-1}$, and rewrite:

$$\lim_{n \rightarrow \infty} E\left(\frac{m}{n}\right) = 1 - e^{-\frac{r}{n}}$$

so that for example sampling $r=n$ times is expected to cover about 63% of the set. This is a reasonable approximation even for $n > 100$.


Derivation of $\operatorname E [M]$ with a classic use of indicator variables and total expectation:

We sample from set $X = \{1, \dots, n \}$. Let $X_i$ be $1$ if member $i$ is in our sample, $0$ otherwise.

For a sample of size $x$, the probability that none of the values are $i$ is $\left(\frac{n-1}{n}\right)^x$. Thus the probability that the sample includes $i$, and therefore $X_i = 1$, is $1-\left(\frac{n-1}{n}\right)^x$.

We have $$\operatorname E[M] = \operatorname E[X_1 + \dots + X_n] = \operatorname E[X_1] + \dots + \operatorname E[X_n] = n \left(1-\left(\frac{n-1}{n}\right)^x\right)$$

Variance is calculated similarly, but we have to consider separately $\operatorname E[X_i^2] $ and $\operatorname E[X_i X_j]$ for $ i \ne j$.