Why does Benford's Law (or Zipf's Law) hold?

As a rough/somewhat-intuitive explanation of why Benford's Law makes sense, consider it with respect to amounts of money. The amount of time(/effort/work) needed to get from \$1000 to \$2000 (100% increase) is a lot greater than the amount of time needed to get from \$8000 to \$9000 (12.5% increase)--increasing money is usually done in proportion to the money one has. Thought about in the other direction, it should take a fixed amount of time to, say, double one's money, so going from \$1000 to \$2000 takes as long as from \$2000 to \$4000 and \$4000 to \$8000, so the leading digit spends more time at lower values than at higher values. Because the value growth is exponential, the time spent at each leading digit is roughly logarithmic.


Benford's Law

To examine the distribution of the mantissas of a dataset, we can examine the fractional parts of the common logarithms of the data. That's because the fractional part of the common logarithm is the common logarithm of the mantissa.

For example, consider numbers with mantissa $2.5$: $$ \begin{align} \log(.025) &= .39794000867203760957 - 2\\ \log(.25) &= .39794000867203760957 - 1\\ \log(2.5) &= .39794000867203760957 + 0\\ \log(25) &= .39794000867203760957 + 1\\ \log(250) &= .39794000867203760957 + 2\\ \log(2500) &= .39794000867203760957 + 3 \end{align} $$ Thus, if the mantissa of a number is $2.5$, the fractional part of its common logarithm is $.39794000867203760957$.

If the data spans several decades (powers of $10$, not years; see Decade (log scale), when we combine the data from all of the decades, it tends to even out. Thus, the fractional parts of the common logarithms of the data should be evenly distributed.

For example, suppose the common logarithm of the data is distributed across $4$ decades as shown below:

$\hspace{7mm}$enter image description here

Summing the distributions of the fractional parts of the logarithms, that is, moving the decades on top of each other and adding curves, we get the black curve at the top of the image below, which is close to evenly distributed:

$\hspace{64mm}$enter image description here

Thus, we arrive at the principal assumption of Benford's Law: the logarithm of the mantissa of data which spans several decades is typically distributed evenly.

Note the hash marks on the bottoms of the graphs above. These marks separate where the different leading digits of the mantissa live. On the line segment below, we expand these hash marks and align the leading digit of the mantissa with the fractional part of the common logarithm. The leading digit of the mantissa is $1$ if the fractional part of the common logarithm is between $0$ and $.30103$; the leading digit is $2$ if the fractional part is between $.30103$ and $.47712$; and so on.

$\hspace{7mm}$enter image description here

If the principal assumption of Benford's Law holds, the fractional part of the common logarithm is evenly distributed. In view of the previous diagram, it is obvious that the probability of $1$ being the leading digit is greater than that of $2$ being the leading digit; the probability of $2$ is greater than that of $3$; and so on. This is made precise below.

Data that has a mantissa starting with the digit $1$ has a common logarithm whose fractional part ranges from $\log(1)$ to $\log(2)$. If the fractional part of the common logarithm of the data is evenly distributed, then the portion of the data that starts with $1$ would be $$ \frac{\log(2)-\log(1)}{\log(10)-\log(1)}=.30102999566398119521 $$ Similarly, data that has a mantissa starting with the digit $2$ has a common logarithm whose fractional part ranges from $\log(2)$ to $\log(3)$. Thus, the portion of the data starting with $2$ would be $$ \frac{\log(3)-\log(2)}{\log(10)-\log(1)}=.17609125905568124208 $$ In the same manner, data that has a mantissa starting with the digit $d$ has a common logarithm whose fractional part ranges from $\log(d)$ to $\log(d+1)$. Thus, the portion of the data starting with $d$ would be $$ \frac{\log(d+1)-\log(d)}{\log(10)-\log(1)}\tag{1} $$ Using $(1)$, we can compute the probability that such data will start with the digit $d$: $$ \begin{array}{} d&P(d)\\ \hline\\ 1&.30102999566398119521\\ 2&.17609125905568124208\\ 3&.12493873660829995313\\ 4&.096910013008056414359\\ 5&.079181246047624827723\\ 6&.066946789630613198203\\ 7&.057991946977686754929\\ 8&.051152522447381288949\\ 9&.045757490560675125410 \end{array} $$ This distribution of leading digits is called Benford's Law.


Further Digits

The probability that the first two digits are $10$ is $$ \frac{\log(11)-\log(10)}{\log(100)-\log(10)}=.041392685158225040750 $$ The probability that the first two digits are $20$ is $$ \frac{\log(21)-\log(20)}{\log(100)-\log(10)}=.021189299069938072794 $$ Adding the probabilities for all first digits, we can compute the probability that the second digit is $0$ to be $.11967926859688076667$.

In this manner, we can compute the probability that the second digit is $d$: $$ \begin{array}{} d&P(d)\\ \hline\\ 0&.11967926859688076667\\ 1&.11389010340755643889\\ 2&.10882149900550836859\\ 3&.10432956023095946693\\ 4&.10030820226757934031\\ 5&.096677235802322528359\\ 6&.093374735783036121570\\ 7&.090351989269603369600\\ 8&.087570053578861399175\\ 9&.084997352057692199898 \end{array} $$ For reference, here are the probabilities that the third digit is $d$: $$ \begin{array}{} d&P(d)\\ \hline\\ 0&.10178436464421710175\\ 1&.10137597744780144287\\ 2&.10097219813704129959\\ 3&.10057293211092617495\\ 4&.10017808762794737592\\ 5&.099787575692177452606\\ 6&.099401309944962084127\\ 7&.099019206561896092170\\ 8&.098641184154777437875\\ 9&.098267163678253538152 \end{array} $$ Here are the probabilities that the fourth digit is $d$: $$ \begin{array}{} d&P(d)\\ \hline\\ 0&.10017614693993552632\\ 1&.10013688811757926504\\ 2&.10009767259461432585\\ 3&.10005850028348653742\\ 4&.10001937109690488020\\ 5&.099980284947840433784\\ 6&.099941241749525329518\\ 7&.099902241415451708313\\ 8&.099863283859370683672\\ 9&.099824368995291309873 \end{array} $$


For the case of Bendford's Law, of course scale invariance is a necessary condition; it the law must be true either if we measure things in meters or in feet or in furlongs, thus multiplying given data for a constant, the only distribution which allows this is the logarithmic one. But its being necessary does not mean that it is the answer, of course.

Scale invariance is not relevant for Zipf's Law, however, since we have an absolute rank.