Maximum Likelihood Estimator of parameters of multinomial distribution

Consider a positive integer $n$ and a set of positive real numbers $\mathbf p=(p_x)$ such that $\sum\limits_xp_x=1$. The multinomial distribution with parameters $n$ and $\mathbf p$ is the distribution $f_\mathbf p$ on the set of nonnegative integers $\mathbf n=(n_x)$ such that $\sum\limits_xn_x=n$ defined by $$ f_\mathbf p(\mathbf n)=n!\cdot\prod_x\frac{p_x^{n_x}}{n_x!}. $$ For some fixed observation $\mathbf n$, the likelihood is $L(\mathbf p)=f_\mathbf p(\mathbf n)$ with the constraint $C(\mathbf p)=1$, where $C(\mathbf p)=\sum\limits_xp_x$. To maximize $L$, one asks that the gradient of $L$ and the gradient of $C$ are colinear, that is, that there exists $\lambda$ such that, for every $x$, $$ \frac{\partial}{\partial p_x}L(\mathbf p)=\lambda\frac{\partial}{\partial p_x}C(\mathbf p). $$ In the present case, this reads $$ \frac{n_x}{p_x}L(\mathbf p)=\lambda, $$ that is, $p_x$ should be proportional to $n_x$. Since $\sum\limits_xp_x=1$, one gets finally $\hat p_x=\dfrac{n_x}n$ for every $x$.


If an observation is

$$\begin{align} p_1 = P(X_1) &= \frac{x_1}{n} \\ &...\\ p_m = P(X_m) &= \frac{x_m}{n} \end{align}$$

then the likelihood which can be described as joint probability is (https://en.wikipedia.org/wiki/Multinomial_theorem)

$$\begin{align} L(\mathbf{p}) &= {{n}\choose{x_1, ..., x_m}}\prod_{i=1}^m p_i^{x_i} \\ &= n! \prod_{i=1}^m \frac{p_i^{x_i}}{x_i!} \end{align}$$

and the log-likelihood is

$$\begin{align} l(\mathbf{p}) = \log L(\mathbf{p}) &= \log \bigg( n! \prod_{i=1}^m \frac{p_i^{x_i}}{x_i!} \bigg)\\ &= \log n! + \log \prod_{i=1}^m \frac{p_i^{x_i}}{x_i!} \\ &= \log n! + \sum_{i=1}^m \log \frac{p_i^{x_i}}{x_i!} \\ &= \log n! + \sum_{i=1}^m x_i \log p_i - \sum_{i=1}^m \log x_i! \end{align}$$

Posing a constraint ($\sum_{i=1}^m p_i = 1$) with Lagrange multiplier

$$\begin{align} l'(\mathbf{p},\lambda) &= l(\mathbf{p}) + \lambda\bigg(1 - \sum_{i=1}^m p_i\bigg) \end{align}$$

To find $\arg\max_\mathbf{p} L(\mathbf{p},\lambda) $

$$\begin{align} \frac{\partial}{\partial p_i} l'(\mathbf{p},\lambda) = \frac{\partial}{\partial p_i} l(\mathbf{p}) + \frac{\partial}{\partial p_i} \lambda\bigg(1 - \sum_{i=1}^m p_i\bigg) &= 0\\ \frac{\partial}{\partial p_i} \sum_{i=1}^m x_i \log p_i - \lambda \frac{\partial}{\partial p_i} \sum_{i=1}^m p_i &= 0 \\ \frac{x_i}{p_i}- \lambda &= 0 \\ p_i &= \frac{x_i}{\lambda} \\ \end{align}$$

Thus, $$\begin{align} p_i &= \frac{x_i}{n} \end{align}$$

because

$$\begin{align} p_i &= \frac{x_i}{\lambda} \\ \sum_{i=1}^m p_i &= \sum_{i=1}^m \frac{x_i}{\lambda} \\ 1 &= \frac{1}{\lambda} \sum_{i=1}^m x_i \\ \lambda &= n \end{align}$$

Finally, the most likely probability distribution is

$$\begin{align} \mathbf{p} = \bigg( \frac{x_1}{n}, ..., \frac{x_m}{n} \bigg) \end{align}$$


Let $\mathbf{X}$ be a RV following multinomial distribution. Then, $$\begin{align}P(\mathbf{X} = \mathbf{x};n,\mathbf{p}) &= n!\,\Pi_{k=1}^K \frac{p_k^{x_k}}{x_k!} \end{align}$$ $x_i$ is the number of success of the $k^{th}$ category in $n$ random draws, where $p_k$ is the probability of success of the $k^{th}$ category. Note that, $$\begin{align}\sum_{k=1}^K x_k &= n\\ \sum_{k=1}^{K} p_k &=1 \end{align}$$

For the estimation problem, we have $N$ samples $\mathbf{X_1}, \ldots,\mathbf{X_N}$ drawn independently from above multinomial distribution. The log-liklihood is given as $$\mathcal{L}(\mathbf{p},n) = \sum_{i=1}^N \log P(\mathbf{x_i},n,\mathbf{p})$$ where $$\begin{align}\log P(\mathbf{x_i},n,\mathbf{p}) &= \log \frac{n!}{\Pi_k x_{ik}!} + \sum_{k=1}^{K} x_{ik} \log p_k \\ \sum_{i=1}^N \log P(\mathbf{x_i},n,\mathbf{p}) &= C + \sum_{k=1}^{K} N_k \log p_k \end{align}$$ where $N_k = \sum_{i=1}^{N} x_{ik}$, is the total number of success of $k^{th}$ category in $N$ samples.

For MLE estimate of $\mathbf{p}$, assuming $n$ is known, we solve the following optimization problem: $$\begin{align} \max_{\mathbf{p}} &\,\, \mathcal{L}(\mathbf{p},n) \\ s.t. & \,\, \sum_{k=1}^{K} p_k \,\,=1\end{align}$$ Using equality constraint for variable reduction, $$p_K\,=\, 1 - \sum_{k=1}^{K-1} p_k$$ We have an unconstrained problem in $K-1$ variables. Compute gradient for stationary point computation as, $$\begin{align}\frac{\partial\mathcal{L}(\mathbf{p},n)}{\partial p_k} &= \frac{N_k}{p_k} - \frac{N_K}{p_K}\,\,=\,\, 0 \\ p_k &= \frac{N_k\,p_K}{N_K}\end{align}$$ Solving, with $\sum_{k=1}^{K} p_k\,=\, 1$ gives MLE estimate for $p_k$, $$p_k = \frac{N_k}{nN}$$