Number of trials rolling six 6-sided dice to get 6 unique values

As I outlined in a comment, we denote by $t_k, 1 \leq k \leq 6$, the expected number of further rolls required to get six distinct numbers when there are currently $k$ distinct numbers. So, for example, $t_6 = 0$, because there are already six distinct numbers.

If there are currently $k = 5$ distinct numbers, we pick up the duplicate and reroll it. There are two possibilities:

  • The reroll is also a duplicate, with probability $\frac56$.
  • The reroll is the last unrolled number, with probability $\frac16$.

This allows us to write the recurrence

$$ t_5 = 1 + \frac56 t_5 $$

which can be solved to yield $t_5 = 6$. If there are $k = 4$ distinct numbers, we pick up the two duplicates and reroll them. There are four possibilities:

  • Both rerolls are still duplicates, with probability $\left(\frac23\right)^2 = \frac49$.
  • One reroll is a duplicate, but the other is a new number, with probability $2\left(\frac23\right)\left(\frac13\right) = \frac49$.
  • Both rerolls are the same new number, with probability $2\left(\frac16\right)^2 = \frac{1}{18}$.
  • The rerolls are the last two numbers, also with probability $2\left(\frac16\right)^2 = \frac{1}{18}$.

The second and third cases both yield five distinct numbers, so we can write the recurrence

$$ t_4 = 1 + \frac49 t_4 + \left(\frac49+\frac{1}{18}\right) t_5 $$

Plugging in $t_5 = 6$ reduces this to

$$ t_4 = 4 + \frac49 t_4 $$

which yields $t_4 = \frac{36}{5}$. In general, we may write

$$ t_k = 1 + \sum_{j=k}^5 p_{kj} t_j $$

where $p_{kj}$ is the probability of going from $k$ distinct numbers to $j \geq k$ distinct numbers in a single roll. There's probably an explicit summation form for this, but I'm afraid I'm too lazy to think of it at the present time. At any rate, we can continue along in the same vein to write

\begin{align} t_3 & = 1 + \frac18 t_3 + \frac{37}{72} t_4 + \frac13 t_5 \\ & = 1 + \frac18 t_3 + \frac{37}{10} + 2 \\ & = \frac{67}{10} + \frac18 t_3 \end{align}

yielding $t_3 = \frac{268}{35}$, then

\begin{align} t_2 & = 1 + \frac{1}{81} t_2 + \frac{65}{324} t_3 + \frac{55}{108} t_4 + \frac{7}{27} t_5 \\ & = 1 + \frac{1}{81} t_2 + \frac{871}{567} + \frac{11}{3} + \frac{14}{9} \\ & = \frac{4399}{567} + \frac{1}{81} t_2 \end{align}

yielding $t_2 = \frac{4399}{560}$, then

\begin{align} t_1 & = 1 + \frac{1}{7776} t_1 + \frac{155}{7776} t_2 + \frac{25}{108} t_3 + \frac{325}{648} t_4 + \frac{25}{108} t_5 \\ & = 1 + \frac{1}{7776} t_1 + \frac{136369}{870912} + \frac{335}{189} + \frac{65}{18} + \frac{25}{18} \\ & = \frac{986503}{124416} + \frac{1}{7776} t_1 \end{align}

yielding $t_1 = \frac{986503}{124400}$. (Thanks to @user in the comments for noticing an error in my original computation!) Finally, we observe that if $k = 1$ (that is, if you only have one distinct number), you're essentially right where you started, so the overall expected number of rolls until you get six distinct numbers is

$$ t = t_1 = \frac{986503}{124400} \approx 7.93009 $$

There may be a simpler and cleverer way to this answer.


As a back-of-the-envelope calculation, I expect around $$\frac{\pi^2}6n$$ rolls are needed.
Most rolls are spent tidying up the final few numbers.
It takes $n$ rolls to get the final number.
With two numbers to go, success is four times as likely, as two dice each have two successful rolls, so around $n/4$ rolls are needed to advance. With three to go, around $n/9$ are needed. The chance of advancing more than one step at a time is relatively small. So my leading-order estimate is $n+(n/4)+(n/9)+...$ which is the number at the top of this answer.
EDIT A better fit seems to be $$\frac{\pi^2}6n-\frac12\sum_{k=1}^n\frac1k-0.75$$ I got the first correction from a more precise version of the argument above, but the $0.75$ is taken from simulations, a million trials at each of $n=2$ to $20$.
The following graph shows the difference between the average and $\pi^2n/6$. One curve is simulations, the other curve is from the formula above.
enter image description here

EDIT: I want to record where the first correction term comes from.
I change variables, so that $s_k=t_{n-k}$, and $k$ is the number of dice being rolled. Following the accepted answer, out of $n^k$ possible rolls, most roll no new numbers; sometimes one of the $k$ dice rolls one of $k$ new numbers; or two roll the same new number; or two roll different new numbers. The rest will be $O(n^{k-3})s_k$, and negligible for this calculation. $$n^k s_k=n^k+(n-k)^k s_k + \\ k^2 (n-k)^{k-1} s_{k-1} + \\ {k\choose2} k (n-k)^{k-2} s_{k-1} + \\ {k\choose2} (k^2-k) (n-k)^{k-2}s_{k-2}+...$$ To leading order, $k^2s_k=n+k^2s_{k-1}$, which leads to the $n\pi^2/6$.
Now bring in the next order, let $s_k=n×a_k+b_k$. We know $a_k=\sum^k_{i=1}(1/i^2)$. The $b_{k-2}$ term is negligible, but by combining the known $a_{k-1}$ and $a_{k-2}$, it simplifies to $b_k=b_{k-1}-1/(2k)$