Why does the Boltzmann factor $e^{-E/kT}$ seem to imply that lower energies are more likely?

The Maxwell-Boltzmann distributes $N$ particles in energy levels $E_i$ such that the entropy is maximized for a fixed total energy $E=\sum E_i N_i$.

The probability that a particle is in the energy level $E_i$ is proportional to the number of particles in the energy level $E_i$ in this particular arrangement of particles in which entropy is maximized (the Maxwell-Boltzmann distribution), which is $N_i$. It so happens that when we distribute the particles such that entropy is maximized, more particles populate the lower energy levels.

I don't follow the above argument @Ben Crowell - we want to show that more particles are distributed in the lowest energy level. In the above, we write $\sum E_i = N_0 E_0 + E_R$ conclude that probability is maximized if the energies of the particles in the lowest energy state, $N_0 E_0$, is minimized, which occurs for $N_0 = 0$ - the opposite of what was desired.

I'm not sure how to intuitively explain the solution to distributing the particles such that entropy is maximized. If we agree that by distributing the particles as evenly as possible in the energy levels, we will maximize the entropy, we can try:

Suppose we require $\sum N_i E_i \equiv \hat{E}$ and that we have plenty of particles $N$, and that $E_i$ increases with $i$:

  1. Starting from $E_0$, put one particle in each energy level whilst $\sum E_i < \hat{E}$.
  2. We need to distribute the remaining particles. Starting from $E_0$, again put one particle in each energy level whilst $\sum N_i E_i < \hat{E}$. The lowest energy level, $E_0$, now has 2 particles
  3. Repeat 2., until all particles are distributed.

We can see that the lowest energy level will be most populated, and that $N_i$, and hence the probability that a particle is in state $E_i$, decreases with $i$. This algorithm won't exactly reproduce the Maxwell-Boltzmann distribution of particles in the energy levels, but it might help with an intuitive feel of why the lower energy levels are more probable.


Nice question. Part of this is basic stuff, but the question about why it's at a maximum for low energies is a nice conceptual question that wasn't immediately obvious to me.

The Boltzmann factor is the (unnormalized) probability that a specific degree of freedom will be in a specific state. For instance, say we have some helium gas, and we choose one particular atom. One of this atom's degrees of freedom is its momentum $p_y$ along the $y$ axis. This momentum carries with it a certain energy $E=p_y^2/2m$. The Boltzmann factor tells us the probability of that specific value of $E$ compared to other possible values of $E$. The whole thing is actually simplest in the quantum-mechanical case, where the states are discrete. WP's article on the Boltzmann factor explains the classical case as well.

When we talk about phase space, we usually mean the phase space of the whole system, which includes all its degrees of freedom. In that sense, no, the Boltzmann factor is not a probability distribution over phase space.

BTW, note that the energy scale does not have to start at zero. The lowest possible energy could be negative or positive.

The reason that the lowest energy is most probable is that by taking as much energy as possible out of that degree of freedom, we can give it to the rest of the system. Let's say the rest of the system has some energy $E_R=E_{tot}-E$. The rest of the system then has some number of states $\Omega(E_R)$, which grows as a function of $E_R$. The probability that our chosen degree of freedom has a given energy $E$ is proportional to the number $\Omega$ of ways that the rest of the system can accomodate that $E$. (This is assuming that energy level $E$ is non-degenerate, so that specifying $E$ completely specifies the state of this degree of freedom.) Since this probability increases with $E_R$, it decreases with $E$.


  • First of all the probability density for a system in a canonical ensemble to be at a given energy (i.e. with respect to a measure $dE$) is not the formula you gave but rather:

$$\rho(E) = \frac{\Omega(E)e^{-\beta E}}{\int_0^{+\infty}dE\:\Omega(E)e^{-\beta E}} = \frac{e^{S(E)/k_B}e^{-\beta E}}{Q}=\frac{e^{-\beta F(E)}}{Q}$$

we thus see that weiht associated to a given energy state is more related to a free energy $F(E)=E-TS(E)$ than the simply the actual energy.

Now, the probability density to be in a given microstate $\mu$ (with respect to some phase space measure $d\mu$) for a system in a contact with a thermostat is:

$$f(\mu) = \frac{e^{-\beta E(\mu)}}{\int d\mu\:e^{-\beta E(\mu)}}$$

  • Secondly, I will try to summarize a bit what has been said by the others but in a general situation working with probabilities nstead of probability densities (there is no big difference).

It is important to realize that when a system is in contact with a thermostat at temperature $T$, you in fact have a small system, say $1$, that interacts with a big one, say $2$. The whole thing can be considered as an isolated system of energy $E$. In thermodynamic equilibrium all microstates of the isolated system are equi-probable and have a probability:

$$p(\mu) \equiv \frac{1}{\sum_{E_1}\Omega_1(E_1)\Omega_2(E-E_1)} \tag{1}$$

Here it is assumed that $E=E_1+E_2$. This is only true if there is no volume interaction between the system $1$ and the system $2$.

Now, the probability for the system $1$ to be in some microstate $\mu_1$ with energy $E_1(\mu_1)$ is nothing but the sum of $(1)$ over all the possible microstates of the system $2$ that ensure that $E_1+E_2=E$ i.e.:

$$p_1(\mu_1) \equiv \frac{\Omega_2(E-E_1(\mu_1))}{\sum_{E_1}\Omega_1(E_1)\Omega_2(E-E_1)} \tag{2}$$

What we see already is that the degeneracy of a microstate of system $1$ is the total number of microstates of system $2$ compatible with the constraint $E=E_1+E_2$. We thus already see that the bigger $E_1(\mu_1)$, the smaller the weight associated to the microstate $\mu_1$ (if we assume that the number of microstate is an increasing function of the energy).

This general result can be made quantitative in the case of small $E_1$ and it gives the Boltzmann weight:

$$p_1(\mu_1) \equiv \frac{e^{-\beta_2 E_1(\mu_1)}}{\sum_{E_1}\Omega_1(E_1)e^{-\beta_2 E_1}} \tag{3}$$

where $\beta_2 = 1/k_B T_2$ tells us how the degeneracy $\Omega_2(E-E_1(\mu_1))$ decreases with $E_1$ for small $E_1$.