Expected value of $\max\limits_{1\leq i\leq m}X_i-\min\limits_{1\leq i\leq m} X_i$ for balls and bins problem

Not a satifactory answer, but here is a small observation: If $M_m := \mathbb{E}\left[\max_{1\leq i\leq m} W_i \right]$ denotes the expectation of the maximum of $m$ i.i.d. standard normal variables $W_1, \cdots, W_m$, then we have

$$\lim_{N\to\infty} \frac{1}{\sqrt{N}}\mathbb{E}\left[ \max_{1\leq i \leq m} X_i - \min_{1\leq i \leq m} X_i \right] = \frac{2M_m}{\sqrt{m}} . \tag{*}$$

The idea is as follows. Roughly speaking, the main source of the correlation between $X_i$'s is the condition that the total sum $X_1 + \cdots + X_m$ is fixed. That is, we can loosely think of each $X_i$ as the following re-centered version

$$ X_i \quad {``}\approx{\text{''}} \quad \tilde{X}_i + \frac{N}{m} - \frac{\sum_{k=1}^m\tilde{X}_k}{m} $$

of some i.i.d. random variables $\tilde{X}_1, \cdots, \tilde{X}_m$. Since this re-centering does not affect the difference, we expect that $Y \approx \tilde{Y} := \max_i \tilde{X}_i - \min_i \tilde{X}_i$. Taking advantage of the independence, it is easier to investigate $\tilde{Y}$ instead.


Here is a more detailed argument. If we consider

$$ \tilde{Z}_i = \frac{X_i - (N/m)}{\sqrt{N/m}} $$

then the random vector $\tilde{Z} = (\tilde{Z}_1, \cdots, \tilde{Z}_m)$ converges in distribution to the multivariate normal distribution $\mathcal{N}(0, C)$ with the covariance matrix $C$ satisfying

$$ C_{ij} = \delta_{ij} - \frac{1}{m}.$$

So we consider a multivariate normal vector $Z = (Z_1, \cdots, Z_m) \sim \mathcal{N}(0, C)$. We also introduce a normal variable $\bar{Z} \sim \mathcal{N}(0, \frac{1}{m})$ which is independent of $Z$. Finally, consider $W_i = Z_ i + \bar{Z}$. Then we easily compute that

$$ \operatorname{Cov}(W_i, W_j) = C_{ij} + \frac{1}{m} = \delta_{ij}, $$

from which we find that $(W_i)$ is an i.i.d. standard normal variables. Since $Z_i - Z_j = W_i - W_j$, we may study the maximum and minimum of $W$ instead. So

\begin{align*} \mathbb{E}\left[\max_i Z_i - \min_i Z_i\right] = \mathbb{E}\left[\max_i W_i - \min_i W_i\right] = 2\mathbb{E}\left[\max_i W_i\right] = 2M_m \end{align*}

Using this we can prove that $\text{(*)}$ holds.


Have you seen:

  • Greenwood and Glasgow (1950), Distribution of Maximum and Minimum Frequencies in a Sample Drawn from a Multinomial distribution, The Annals of Mathematical Statistics, 21(3), p.416-424 https://projecteuclid.org/download/pdf_1/euclid.aoms/1177729800

  • Freeman, P. R. (1979), Algorithm AS 145: Exact Distribution of the Largest Multinomial Frequency, Applied Statistics, 28(3), p.333-6 http://www.jstor.org/stable/2347220?

  • Carrado , C (2011), The exact distribution of the maximum, minimum and the range of Multinomial/Dirichlet and Multivariate Hypergeometric frequencies, Statistics and Computing 21(3), 349-359 https://link.springer.com/article/10.1007%2Fs11222-010-9174-3