Benford's law and voting in Georgia

Using his data for first digits, neither the Biden nor the Trump first digits are significantly different from what would be predicted by Benford's law. Using R:

countyBiden <- c(47,40,16,21,7,9,9,5,5)
countyTrump <- c(43,36,12,15,9,17,9,9,9)
benford <- log10(2:10) - log10(1:9)
chisq.test(countyBiden, p=benford)
# 
#         Chi-squared test for given probabilities
# 
# data:  countyBiden
# X-squared = 12.601, df = 8, p-value = 0.1263
#
chisq.test(countyTrump, p=benford)
#
#         Chi-squared test for given probabilities
#
# data:  countyTrump
# X-squared = 11.231, df = 8, p-value = 0.189

That is even before considering whether Benford's law should apply here.

To apply Benford's law, it's required that the numbers in question are distributed over several orders of magnitude. I'm not going to waste my time watching a propaganda / disinfo video; I'll just note that voting precincts are of relatively uniform size and so Benford's law is inappropriate in this setting.

As I said in an answer on Skeptics SE, you can get a "Benford plot" by starting with a simple histogram of the values on a $\log_{10}$ scale, then "wrapping" the histogram by merging buckets that have the same fractional part, then redistributing the values into nine buckets of unequal size. Here's an example of doing this on about 2000 random log-normally distributed values with a standard deviation of 0.5:

The buckets on the right follow Benford's law to exactly the same extent that the buckets in the middle are filled uniformly. The buckets in the middle will generally be filled uniformly if the original distribution on the left is "wide enough"; of course they could also end up uniformly filled for other reasons, but that's the most common reason.

Here's what happens if you use the same number of random values with the same mean, but a standard deviation of 0.2:

Benford's law is violated not because the distribution is any less realistic, but just because it's narrower. The fact that the mean is 10.0 and not, say, 10.5 makes a big difference here; the Benford plot would look very different if the mean was shifted, though the original histogram wouldn't.

In my other answer you can see similar charts using real data from Chicago precincts.

This video used data scraped from here. Here are my histograms of that data for Trump (top) and Biden (bottom):

The charts on the right don't quite match the ones in the question, presumably because the numbers have been updated, but they're close.

I didn't do any goodness-of-fit analysis, but what I see here is a whole lot of nothing. The distributions for both Trump and Biden look like what I'd expect from a random simulation, keeping in mind that there are only 159 counties in Georgia, to the 2069 precincts in Chicago. The Benford diagrams on the right are not so obviously random, but that's simply because they use uneven buckets; they are harder to eyeball. There is no advantage to bucketing the data that way. It only obfuscates it. The rationale for checking Benford's law in base 10 specifically is that numbers made up by human beings don't have the correct distribution of leading digits, but that would still show up in the histograms on the left as a pattern with a period of 1. I'm not convinced that there's any advantage to doing a Benford's law analysis over just analyzing the original histogram.

Benford's law and voting in Georgia

Tags:

Probability

Related

Recent Posts