Time complexity of variation on Coupon's collector problem

By way of enrichment here is the complexity using Stirling numbers of the second kind. Using the notation from this MSE link we have $n$ coupons, and ask about the expected time until a multiset containing instances of $j$ different coupons has been drawn.

First let us verify that we indeed have a probability distribution here. We have for the number $T$ of coupons being $m$ draws that

$$P[T=m] = \frac{1}{n^m} \times {n\choose j-1} \times {m-1\brace j-1} \times (j-1)! \times (n+1-j).$$

What happens here is that for a run of $m$ samples to produce a multiset containing instances of $j$ different coupons for the first time on the last sample we have two parts, a prefix of length $m-1$ and a terminal sample that completes the set. Therefore we must choose the $j-1$ values excluding the one that occurs last for the prefix from the $n$ possibilities which gives the first binomial coefficient. Next we partition the first $m-1$ slots into $j-1$ non-empty sets in an ordered set partition. (Stirling number and factorial). The smallest value chosen gets the slots listed in the first set, the next one those in the second set etc. Finally we get $n-(j-1)$ possibilities ($j-1$ values from the prefix have already been used) for the terminal sample that completes the selection. Combine with $n^m$ possible choices.

Recall the OGF of the Stirling numbers of the second kind which says that

$${n\brace k} = [z^n] \prod_{q=1}^k \frac{z}{1-qz}.$$

This gives for the sum of the probabilities

$$\sum_{m\ge 1} P[T=m] = {n\choose j-1} (j-1)! (n+1-j) \frac{1}{n} \sum_{m\ge 1} \frac{1}{n^{m-1}} {m-1\brace j-1}.$$

Focusing on the sum we obtain

$$\sum_{m\ge 1} \frac{1}{n^{m-1}} [z^{m-1}] \prod_{q=1}^{j-1} \frac{z}{1-qz} = \prod_{q=1}^{j-1} \frac{1/n}{1-q/n} \\ = \prod_{q=1}^{j-1} \frac{1}{n-q} = \frac{(n-j)!}{(n-1)!}.$$

Combining this with the outer factor we get

$${n\choose j-1} (j-1)! (n+1-j) \frac{1}{n} \frac{(n-j)!}{(n-1)!} \\ = {n\choose j-1} (j-1)! \frac{(n+1-j)!}{n!} = 1$$

This confirms it being a probability distribution.

We then get for the expectation that

$$\sum_{m\ge 1} m\times P[T=m] \\ = {n\choose j-1} (j-1)! (n+1-j) \frac{1}{n} \sum_{m\ge 1} \frac{m}{n^{m-1}} {m-1\brace j-1}.$$

We once more focus on the sum to get

$$\sum_{m\ge 1} \frac{m}{n^{m-1}} [z^{m-1}] \prod_{q=1}^{j-1} \frac{z}{1-qz} = \sum_{m\ge 1} \frac{m}{n^{m-1}} [z^{m}] z \prod_{q=1}^{j-1} \frac{z}{1-qz} \\ = \left.\left( \prod_{q=0}^{j-1} \frac{z}{1-qz} \right)'\right|_{z=1/n} \\ = \left.\left( \prod_{q=0}^{j-1} \frac{z}{1-qz} \sum_{p=0}^{j-1} \frac{1-pz}{z} \frac{1}{(1-pz)^2} \right)\right|_{z=1/n} \\ = \left.\left( \prod_{q=0}^{j-1} \frac{z}{1-qz} \sum_{p=0}^{j-1} \frac{1}{z} \frac{1}{1-pz} \right)\right|_{z=1/n} \\ = \prod_{q=0}^{j-1} \frac{1/n}{1-q/n} \sum_{p=0}^{j-1} \frac{1}{1/n} \frac{1}{1-p/n} \\ = \prod_{q=0}^{j-1} \frac{1}{n-q} \sum_{p=0}^{j-1} \frac{n^2}{n-p} = n \prod_{q=1}^{j-1} \frac{1}{n-q} \sum_{p=0}^{j-1} \frac{1}{n-p}.$$

Retrieving the outer factor we have

$${n\choose j-1} (j-1)! (n+1-j) \frac{1}{n} \frac{(n-j)!}{(n-1)!} \times n \sum_{p=0}^{j-1} \frac{1}{n-p}.$$

The front simplifies to one as before and we are left with

$$n\sum_{p=0}^{j-1} \frac{1}{n-p} = n \left(\sum_{p=0}^{n-1} \frac{1}{n-p} - \sum_{p=j}^{n-1} \frac{1}{n-p}\right).$$

This is $$\bbox[5px,border:2px solid #00A000]{\Large n \times \left( H_n - H_{n-j} \right)}$$

This yields $n H_n$ when $j = n$ and $1$ when $j=1$ which are both correct. Using $H_n \sim \log n + \gamma$ we get for $j = n/2$ the expectation $n\log 2.$

The expected number of trials until your collection has $n$ distinct elements is: $${m\over m}+{m\over m-1}+{m\over m-2}+\cdots +{m\over m-n+1}. $$

Time complexity of variation on Coupon's collector problem

Tags:

Computational Complexity

Coupon Collector

Related

Recent Posts