Is a spontaneous decrease in entropy *impossible* or just extremely unlikely?

The appropriate mathematical tool to understand this kind of question, and more particularly Dale's and buddy's answers, is large deviation theory. To quote wikipedia, "large deviations theory concerns itself with the exponential decline of the probability measures of certain kinds of extreme or tail events". In this context, "exponential decline" means: probability that decreases exponentially fast with the increase of number of particles.
TL;DR: it can be shown that the probability to observe an evolution path for a system that decreases entropy is non-zero, and it decreases exponentially fast with the number of particles; thanks to a statistical mechanics of "trajectories", based on large deviation theory.

Equilibrium statistics

In equilibrium statistical mechanics, working in the appropriate thermodynamical ensemble, for instance the microcanonical ensemble in this case, one could relate the probability to observe a macrostate $M_N$ for the $N$ particles in the system, to the entropy of the macrostate $S[M_N]$: $\mathbf{P}_{eq}\left(M_N\right)\propto\text{e}^{N\frac{\mathcal{S}[M_N]}{k_{B}}}.$ Naturally, the most probably observed macrostate, is the equilibrium state, the one which maximizes the entropy. And the probability to observe macrostates that are not the equilibrium state decreases exponentially fast as the number of particles goes to infinity, this is why we can see it as a large deviation result, in the large particle numbers limit.

Dynamical fluctuations

Using large deviation theory, we can extend this equilibrium point of view: based on the statistics of the macrostates, to a dynamical perspective based on the statistics of the trajectories. Let me explain.

In your case, you would expect to observe the macrostate of your system $(M_N(t))_{0\leq t\leq T}$, evolving on a time interval $[0,T]$ from an initial configuration $M_N(0)$ with entropy $S_0$ to a final configuration $M_N(T)$ with entropy $S_T$ such as $S_0 \leq S_T$, $S_T$ being the maximal entropy characterizing the equilibrium distribution, and the entropy of the macrostate at a time $t$, $S_t$ being a monotonous increasing function (H-Theorem for the kinetic theory of a dilute gas, for instance).

However, as long as the number of particles is finite (even if it is very large), it is possible to observe different evolutions, particularly if you wait for a very long time, assuming your system is ergodic for instance. By long, I mean large with respect to the number of particles. In particular, it has been recently established that one could formulate a dynamical large deviation result which characterizes the probability of any evolution path for the macrostate of the system ( This result allows to evaluate for large but finite number of particles, the probability to observe any evolution path of the macrostate $(M_N(t))_{0\leq t\leq T}$, including evolution paths such as $S_t$, the entropy of the system a time $t$ is non monotonous. This probability will become exponentially small with the number of particles, and the most probable evolution, that increases entropy, will have an exponentially overwhelming probability as the number of particles goes to infinity.

Obviously, for a classical gas, N is very large, such evolution paths that do not increase entropy won't be observed: you would have to wait longer than the age of the universe to observe your system doing this. But one could imagine systems where we use statistical mechanics, where $N$ is large but not enough to "erase" dynamical fluctuations: biological systems, or astrophysical systems for instance, in which it is crucial to quantify fluctuations from the entropic fate.

What you are interested in is Crook’s fluctuation theorem. It gives the probability of going “backwards” thermodynamically. Specifically, the theorem says:

$$\frac{P(A\rightarrow B)}{P(A\leftarrow B)}=\exp \left( \frac{1}{k_B T}(W_{A\rightarrow B}-\Delta F) \right)$$

In the case of the box, $W_{A\rightarrow B}=0$ so the probability is purely driven by the change in Helmholtz free energy, $\Delta F$.

Noticing that Shannon information entropy is related to thermodynamic entropy like this:

$$ S = k_B H $$

One can express the quantum entropic uncertainty principle for thermodynamic entropies:

$$ S_a + S_b\geq k_B\log\left(\frac e2\right) $$

Where $S_a, S_b$ is temporal and spectral thermodynamic entropies. This shows that entropies can fluctuate in time and spectra. It's not forbidden for entropy fluctuation going backwards, but likely this will be on short time scales and within small partitions of the whole system. And probably backwards entropy fluctuations will be canceled later some time by standard time arrow fluctuations. So not much useful information can be extracted from backwards fluctuations because in principle they are uncontrolable.

Also Bohr suggested a thermodynamic uncertainty relation: $$ {\mathrm{\Delta }}\beta \ge \frac{1}{{{\mathrm{\Delta }}U}} $$

Where $\beta = (k_BT)^{-1}$ is inverse temperature. This relationship means that if you know the system internal energy very precisely, then you don't know anything about its temperature and vise-versa. Now imagine that after molecules diffusion in part A you measure the temperature exactly and the exact internal energy of the B part. Then according to the uncertainty principle it can be that this measurement resulted in half-hot / half-cold molecule partition formation. But, this implies that the measurement has performed some kind of thermodynamic work, so this has nothing to do with spontaneous backwards entropy change and thus falls out of the question formulated by the OP. But still I think it's interesting to think about such kind of possibility, because the act of measurement is vaguely defined and may happen without human intervention.