Dantzig's unsolved homework problems

I think the two problems appear in these papers:

Dantzig, George B. "On the Non-Existence of Tests of 'Student's' Hypothesis Having Power Functions Independent of Sigma." Annals of Mathematical Statistics. No. 11; 1940 (pp. 186-192).

Dantzig, George B. and Abraham Wald. "On the Fundamental Lemma of Neyman and Pearson." Annals of Mathematical Statistics. No. 22; 1951 (pp. 87-93).

Read more at http://www.snopes.com/college/homework/unsolvable.asp#6oJOtz9WKFQUHhbw.99

EDIT: In case snopes ever goes belly up, the story can be found in Albers, Reid, and Dantzig, An Interview with George B. Dantzig: The Father of Linear Programming, College Math J 17 (1986) 292-314. The interview has also been reprinted in Albers, Alexanderson, and Reid, More Mathematical People, page 67.


I'm a few years late to the party, but in fact the problem in the first, solo paper is easy to state with only elementary background, and the arguments in it are entirely reasonable for a talented young grad student to come up with. I have not taken the time to read the second paper. This topic comes up from time to time with interest from a very broad array of people, and nobody seems to have written a straightforward description of either problem, so I'll provide such a description for the first one.

For those with some background: Dantzig showed that in the situation of Student's t-test, the only way to get a hypothesis test whose power for any given alternative is independent of the standard deviation is to use a silly test which always has an equal probability of rejecting or failing to reject, which is obviously not useful.

In an unusual amount of detail, aimed at those with no statistical knowledge:

Lots of data is approximately normally distributed ("bell-shaped"), like IQ scores, birthweights, or people's height. The classical Central Limit Theorem gives one explanation for this phenomenon: complicated traits like birthweight can often be thought of as the result of adding up a large number of competing effects, like the presence or absence of specific genes. It is a statistical fact that under very general hypotheses, adding up many such effects tends to result in a normal distribution. For such data, you'll "usually" get the average value, and with enough observations, you can predict with high accuracy just how likely it is to get a certain amount above or below that average.

A century ago, William Gosset was Head Experimental Brewer at Guinness. He came up against something like the following problem. Certain strains of barley have approximately normally distributed yields. Using only a few data points, how could he tell which type of barley is better, and more importantly, how could he quantify his certainty that his conclusion wasn't simply due to random chance?

A little more formally, say our current strain of barley has an average yield of 100 units, and we're only interested in switching to the new strain if its yield is at least 105 units. So, we have two specific hypotheses:

  • ("Null hypothesis.") The new strain's average yield is 100 units.
  • ("Alternative hypothesis.") The new strain's average yield is 105 units.

At the end of the day, we're going to need to pick one strain of barley or the other. There are hence four probabilities of interest:

  1. In a world where the new strain's average yield is actually 100 units...
    • A. ...the probability that we correctly keep using the old strain.
    • B. ...the probability that we mistakenly switch to the new strain.
  2. In a world where the new strain's average yield is actually 105 units...
    • C. ...the probability that we mistakenly keep using the old strain.
    • D. ...the probability that we correctly switch to the new strain.

We want to somehow minimize the probability of the two types of mistakes, B and C, but doing so requires a trade-off.

Gosset developed a clever test where you can specify in advance the probability of making mistake B--often it's set at 5%. This is called the significance level of the test. Gosset published the it under the pseudonym "Student", and it is now called Student's t-test. One excellent thing about his procedure is that you don't need to know in advance how variable the yield actually is in the sense that the probability of mistake B is always your pre-set value.

If you use his procedure, you can also compute the probability of making mistake C. The power of the test is probability D (namely 1-C), which is thought of as the ability of the test to correctly tell us to switch to the new method. Unlike the significance level, the power of Gosset's procedure does depend on the true variability of the yield.

This dependence makes some intuitive sense, too. Suppose the new strain does have an average yield of 105 units. If that yield had almost no variation, you would expect it to be much easier to correctly switch to the new strain than if the yield had enormous variation which "muddies your data". Of course, expecting something and proving it are two different things! As mentioned above, in the world where the average yield is 100 units, the error probability of Student's t-test is independent of the variation of the yield, so there is certainly something interesting going on.

Here's where Dantzig came in. We could ask if there is any test whatsoever which has the property that, for every fixed alternative, the power does not depend on the true variability of the yield. Dantzig showed that, while such tests technically exist, they are uninteresting in that probabilities A, B, C, and D are all 50%.

Closing remarks:

Finally, I wanted to comment on the tendency towards hyperbole. In Dantzig's 1986 College Mathematics Journal interview, Dantzig is quoted as calling the problems "two famous unsolved problems in statistics". In Dantzig's obituary (repeated on Wikipedia currently), this turned into "two of the most famous unsolved problems in statistics". While this is not my field and I am not old, I'm extremely dubious about the "most famous" claim. For instance, there seems to have been no rush to publish the second solution (it waited for Dantzig's thesis and an accident of someone else solving it). MathSciNet has only 5 citations for the first paper, three historical, and 7 citations for the second, again three historical. These are not the citation counts I would expect from solutions to a field's "most famous unsolved problems", even accounting for recent citation bias.

These exaggerations are frankly not necessary. Dantzig's reputation is enormous already, and the true story of a talented young grad student cleverly finding a few pages of brilliant argument that had eluded his teacher---something he never would have looked for if he knew that what he was working on was unsolved---is enough.