How do I determine sample size for a test?

Let there be $N$ independent rolls. Let $N_i$ be the number of times outcome $i$ has occured, and thus $N_1+\cdots+N_n=N.$

The empirical distribution after $N$ rolls is $\hat{p_i}=N_i/N,$ for $1\leq i\leq n,$ and let the actual distribution be denoted $p_i$ ($p_i=1/n$ in your case). Let $$ \mathbb{p}=(p_1,\ldots,p_n),\quad \hat{\mathbb{p}}=(\hat{p_1},\ldots,\hat{p_n}) $$ and define the total variation ($\ell_1$) distance between the two distributions as $$ \Vert \mathbb{p}-\hat{\mathbb{p}}\Vert_1=\sum_{i=1}^n \mid p_i-\hat{p_i}\mid $$ One of the results in this domain which is easy to apply was proved by Devroye [1]:

Let $\varepsilon \geq \sqrt{20N/n}$ then $$ \mathbb{P}[\Vert \mathbb{p}-\hat{\mathbb{p}}\Vert_1>\varepsilon]\leq 3\exp(-N\varepsilon^2/25) $$ If you want to directly bound the maximum difference in probabilities, i.e., $$ \Vert \mathbb{p}-\hat{\mathbb{p}}\Vert_\infty=\sup_{1\leq i\leq n} \mid p_i-\hat{p_i}\mid $$ you can obtain $$ \mathbb{P}[\Vert \mathbb{p}-\hat{\mathbb{p}}\Vert_\infty>\varepsilon]\leq 4 \exp(-N\varepsilon^2/2),\quad \forall \varepsilon>0. $$ Finally, note that if the second inequality is used with $\varepsilon_0$, the corresponding $\varepsilon$ in the first inequality can be as large as $\varepsilon\leq n\varepsilon_0$.

Luc Devroye. The equivalence of weak, strong and complete convergence in L1 for kernel density estimates. Ann. Statist., 11(3):896–904, 1983.

You need to turn this question around in order to get a useful answer. You will never be able to roll the die enough times to ensure that it is perfectly fair.

What you can do is to say that you want to roll the die enough times that you'll detect if it is not fair. There is a way to answer that mathematically. You need a way to say how far from probabilities $(1/6, 1/6, 1/6, 1/6, 1/6, 1/6)$ you want to be able to detect. For example, is $(5/36, 7/36, 6/36, 6/36, 6/36, 6/36)$ so much dishonest you'd want to catch that? That's the size of the effect you want to detect.

Next you need to say how likely you want to be to detect an effect of that size. Reasonable goals might be 80%, 90%, and 95%. (100% is not reasonable, in testing almost nothing is absolutely for sure.) That is the desired power of the test.

Finally, you have to state what test you'll be using. For detecting unfairness in a die, perhaps a chi-squared goodness-of-fit test at the 5% level. Typically, a test can falsely accuse a die of unfairness. many researchers want to make that mistake rarely and choose the significance level to be small--5% or 1%. So you'd need to specify the type of test and its significance level.

That's a lot of different things to consider. In many cases, to plan the sample size necessary for a successful study, a researcher will decide (at lest tentatively) on an effect size, a type of test, and significance level, and then do a computation to balance power against sample size.

Many statistical software programs have "Power and Sample-size" procedures to do these computations. Showing the mathematical details for power vs. sample size for rolling dice may require too much mathematical detail for a helpful answer here.

Instead let's suppose you want to answer a similar question for tossing a coin to see if it is fair. You would want to know if the coin has Heads probability more than $\Delta = 0.02$ from fair $(1/2).$ You want the significance level to be $\alpha = 0.05$ and power to be $.95 = 1-\beta = 1-.05$ in a test of $H_0: p = 0.5$ against $H_a: p\ne 0.5.$ What sample size is required?

The formula is $$n = \frac{.25(z_\beta+z_{\alpha/2})}{\Delta^2} = 2253,$$ where $z_\beta = 1.645, z_{\alpha/2} = 1.96.$ [The significance level gets split in half for a two-sided test.]

n = .25*(1.645+1.96)/.02^2;  n
[1] 2253.125

That's a lot of coin tosses. Maybe you get to thinking if it takes that many tosses to detect $\Delta$ as small as $0.02,$ then maybe you could be satisfied with effect size $\Delta = 0.05.$ Then you'd need only $n = 361$ tosses of the coin.

There are lots of free Internet pages with power and sample size "Calculators" with varying degrees of transparency how to use them (and varying numbers of pop up ads to look at). Maybe you want to explore. In any case, I hope you know what the main issues are that you need to decide in order to make such computations usefully.

How do I determine sample size for a test?

Tags:

Statistics

Probability

Related

Recent Posts