What is the meaning of Boltzmann’s definition of entropy?

Two obvious desirable features of this definition are:

  • When you put two systems next to each other, considering them as one system, the total number of possible microstates $\Omega_t$ is equal to the product of $\Omega$s of the two systems, $\Omega_t=\Omega_1\times \Omega_2$. But for this system the entropy is the sum of the entropies, indicating the necessity of an exponential definition.
  • The $\ln$ function has the property that the entropy of a system with one microstate $(\Omega=1)$ is zero, which is desirable.

This relation can be obtained form the assumption of equal a priori probabilities, i.e. that the equilibrium corresponds to the macrostate with the maximum number of microstates:

Consider two isolated systems, separately in equilibrium, each with macrostates $E_i^{(0)}, V_i, N_i$ (energy, volume, number of particles). Each of them has a total number of $\Omega_i(N_i,V_i,E_i^{(0)})$ possible microstates.

Now we bring them into thermal contact so that they can exchange energy. After this point, we'll have $E_t=E_1'+E_2'=\text{constant}$. $N$ and $V$ for each system remain unchanged. The total number of possible microstates for each system would be $\Omega_i(N_i,V_i,E_i')$ and for the composite system: $$\Omega=\Omega_1(N_1,V_1,E_1')\times \Omega_2(N_2,V_2,E_2')=\Omega_1(E_1')\Omega_2(E_2')=$$ $$\Omega(E_t,E_1)$$

With the assumption of equilibrium occurs at the point of having maximum $\Omega$, we find the value of $E_1^*$ (and hence $E_2^*$) that maximizes $\Omega(E_t,E_1)$: $$d\Omega=0\to\left (\frac{\partial\Omega_1(E_1)}{\partial E_1}\right )_{E_1=E_1^*} \Omega_2(E_2^*) +\Omega_1(E_1^*)\left (\frac{\partial\Omega_2(E_2)}{\partial E_2}\right )_{E_2=E_2^*}\frac{\partial E_2}{\partial E_1}=0\tag{1}$$ $$\frac{\partial E_2}{\partial E_1}=-1\to$$ $$\beta_1=\left (\frac{\partial \ln \Omega_1(E_1)}{\partial E_1}\right )_{E_1=E_1^*}=\left (\frac{\partial \ln \Omega_2(E_2)}{\partial E_2}\right )_{E_2=E_2^*}=\beta_2\tag{2}$$

Naturally we expect these quantities $\beta_1$ and $\beta_2$ to be related to temperatures of the systems. From thermodynamics we know that $$\left(\frac{\partial S}{\partial E}\right)_{N,V}=\frac{1}{T}\tag{3}$$ Comparing $(2)$ and $(3)$, we can conclude that: $$\frac{\partial S}{\partial (\ln \Omega)}=k$$ or $$\Delta S=k\ln \Omega$$ where $k$ is a constant.


Entropy was first met in classical thermodynamics and was defined as

$$\mathrm dS= \frac{\delta Q}{T}$$ , where $Q$ comes from the first law of thermodynamics

$$\Delta U= Q- W$$

and $T$ is the temperature; $W$ work done by the system.

Once it was established experimentally that matter at the micro level is discrete, i.e. is composed of molecules the statistical behavior of matter became the underlying framework from which classical thermodynamics emerges.

The first law is conservation of energy, which is also a strict law in the ensembles of the microsystem.

It was established in statistical mechanics that the average kinetic energy of the particles is connected with temperature.

The way the classical entropy emerges and is identified with the statistical mechanics derived entropy is not simple.

The statistical definition was developed by Ludwig Boltzmann in the 1870s by analyzing the statistical behavior of the microscopic components of the system. Boltzmann showed that this definition of entropy was equivalent to the thermodynamic entropy to within a constant number which has since been known as Boltzmann's constant. In summary, the thermodynamic definition of entropy provides the experimental definition of entropy, while the statistical definition of entropy extends the concept, providing an explanation and a deeper understanding of its nature.

This paper, for example, proves ( equation 42) that the statistical mechanics entropy is identified with the entropy of classical thermodynamics. The logarithmic dependence comes from the mathematics of the proof of equivalence.


Mustafa's answer gives one important reason for the logarithmic dependence: microstates multiply, whereas we'd like an extrinsic property of a system to be additive. So we simply need an isomorphism that turns multiplication into addition. The only continuous one is the "slide rule isomorphism" aka the logarithm. The base $e$ is arbitrary as you can see from Mustafa's answer: you can use any positive base (aside from 1!), and, as you shift base, you will need to adjust the Boltzmann constant $k_B$ to absorb the multiplicative change-of-base factor.

But an information-theoretic look at the number of possible microstates shows other deep reasons aside from the above. The proof of the Shannon Noiseless Coding Theorem, the gives the informational entropy (also logarithmic) its working meaning: it is the minimum number of bits, or number of "yes-no" answers we need to answer to uniquely identify a particular microstate, assuming all are equally likely. Imagine all the possible microstates arranged in some lexicographic order and then imagine keeping them "on file" in a database arranged as a binary tree. You work your way down the binary tree to find a particular microstate, and the number of branches you need to make on the way (proportional to your seek-and-retrieve time) is $\log_2\Omega$. Or, intuitively, entropy is the length of the shortest book you'd need to write to describe a particular microstate given a system's macroscopic properties. That's some book: if we add merely one joule of heat to a system at one degree kelvin (colder than the cosmic background microwave radiation in deep space), we would need a book bigger than the whole World Wide Web at the end of 2013 to describe the system's microstate!

As I said, you can use $\log_e$ instead of $\log_2$ as long as you keep track of the multiplicative change of base factor in your physical constants (defintion of $k_B$ and $T$).

Section 2 (read it carefully) and the appendix of this paper:

E. T. Jaynes, "Information Theory and Statistical Mechanics".

also give some solid motivations for the logarithm and the entropy formula as being the unique dependence with all the following properties:

  1. It is a continuous function of the probabilities $p_i$ of the the microstates;
  2. If all microstates are equally likely, it is a monotonically imcreasing function of $\Omega$;
  3. If we partition the set of microstates arbitrarily into arbitrary subsets and then think of these subsets as single events in a new "state space" - a "coarse grained" version of the first with where the new events themselves have entropies $H_i$ and probabilities $p_i$ calculated from the original state space, then work out the entropy of for the total state space as $\sum p_j\,H_j$, then we shall get the same answer for the entropy, no matter how we may partition the microstates.

If you think about it, the last point (3) is a powerful generalisation of the "multiplication becomes addition" idea expressed in Mustafa's answer.