Confidence intervals when the number of samples is random

Let $\hat{\mu}=\frac{\sum_i X_i}{N}$ and $\mu = \mathbf{E}(X)$

$$\mathcal{P}(\mu \in [x,y]\,|\, a) = \sum_i \binom ni p^i(1-p)^{n-i}\mathcal{P}(\mu \in [x,y]\,|\,a,N=i)$$

If $pN$ is relatively large, and $\hbox{var}(X) = \sigma^2$ you can represent the true mean as taken from a mixture of gaussian with mean $a$ and variance $\sigma^2/N$. This is not gaussian, but you can compute the pdf.

While the mixture isn't gaussian, you can compute its variance as

$$\sigma^2 \frac{np(1-p)^{n-1} F_{3,2}\left(1,1,1-n;2,2;\frac{p}{p-1}\right)}{1-(1-p)^n}$$

(N.B this only valid when $X$ is normally distributed or approximately valid when $n >> (1-p)/p$) $F_{3,2}$ is the hypergeometric function

For $p = 0.25$ and $N=100$ your average has about $4.13\%$ of the variance of $X$

You can then use conservatively Chebyshev's inequality.


You should speak of the size of the sample, rather than of the number of samples.

You haven't said whether you can actually observe the sample size. Nor whether the probability distribution of the sample size in any way depends on the mean that you're trying to estimate. If you can observe the sample size and if you know it doesn't depend on that which you're trying to estimate, then it is an ancillary statistic (see http://en.wikipedia.org/wiki/Ancillary_statistic) and you can in effect treat it as non-random by conditioning on it.