Maximum entropy principle for Poisson distribution

I believe the second paper you cited (by Harremoës) is actually the answer you're looking for. The Poisson distribution describes the number of occurrences of an event in a fixed interval, under the assumption that occurrences are independent. In particular, the constraint that the events should be independent means that not every discrete distribution is a valid candidate for describing this system, and motivates the choice of the union of infinite Bernoulli variables. Then, Harremoës shows that if you further constrain the expected value (i.e., $\lambda$), then the maximum entropy distribution is the Poisson distribution.

So, the Poisson distribution is the maximum entropy distribution given constraints of counting independent events and having a known expected value.

That said, you can also easily reverse-engineer a (contrived) constraint for which the Poisson distribution would be the maximum entropy distribution.

Let our unknown constraint be $\mathbb{E}[f(k)] = c$. Maximizing the entropy with this constraint, along with the mean being $\lambda$, gives the minimization problem

$\sum_k p(k) \ln p(k) - \alpha \left( \sum_k p(k) - 1\right) - \beta\left(\sum_k k p(k) - \lambda\right) - \gamma \left( \sum_k p(k)f(k) - c \right)$,

where $\alpha$, $\beta$, and $\gamma$ are Lagrange multipliers. Taking the derivative with respect to $p(k)$ yields

$\ln p(k) = -1 + \alpha + \beta k + \gamma f(k)$,

We already know the Poisson distribution has the form $p(k) = e^{-\lambda}\lambda^k/k!$, or $\ln(p(k)) = -\lambda + k \ln(\lambda) - \ln(k!)$. Therefore, we can guess that $f(k)$ has the functional form $\ln(k!)$.

So, the Poisson distribution maximizes entropy when $p$ has mean $\lambda$ and $\mathbb{E}(\ln k!) = $[some particular value depending on $\lambda$].

This approach may not be very satisfying, since it's not clear why we would want a distribution with a specified expectation value of $\ln k!$. The Johnson paper you cited is (in my opinion) similarly unsatisfying, since it essentially proves that the Poisson distribution is the maximal entropy distribution among distributions which are "more log-convex than the Poisson distribution".

I guess the $k!$ comes from a permutation term.

Supose a composite system. Subsystem A has $n_a$ elements, while Subsystem B has $n_b$ elements. There are ${n_a+n_b}\choose{n_a}$ possible permutations. Consider that elements in A may be in $m_a$ states, while elements in B may be in $m_b$ states. You may think that each subsystem is a bag with a given number of pockets. There will be $$ {{n_a+n_b}\choose{n_a}} m_a^{n_a} m_b^{n_b} $$ different states for the composite system, given the partion of the elements. Considering all states have the same probability, the probability of having a patition with $n_a$ elements in A is proportional to the number of possible states for the composite system. Using the binomial theorem we have that the probability of a given partion is

$ P(n_a)={{n_a+n_b}\choose{n_a}} m_a^{n_a} m_b^{n_b}/(m_a+m_b)^{(n_a+n_b)} $

Assuming that $m_b$ grows to infinity, while $(n_a+n_b)/m_b=\mu$ remains constant, yields Poisson $$ P(n_a)=e^{-\mu}\mu^{n_b}/n_a! $$