How to argue against questionable research practices such as P-hacking and Harking?

This sort of thing happens in both the social sciences AND physical sciences. For instance, often a scientist will collect data to test a theory but will also collect lots of extraneous data. Analyses on these extraneous data often should be considered exploratory and labeled as such (because significant results could be due to the multiple tests) [As another example, you don't want to know how often chemists repeat an experiment until they get a good yield, then stop and report that yield without mentioning that that was the best in 20 experiments!]

The fastest solution is to agree to do the multiple analyses, but then tell what you did in the methodology section. If you say that you analyzed it several ways and one way showed significance, readers can decide whether or not to believe the result. Just tell your co-authors that not mentioning that you did multiple analyses is leaving the research improperly described.

However, you can (occasionally) save the day. If, for instance, you did 10 different analyses and picked the best one, you'll be ok if the result would hold under a Bonferroni correction (i.e. instead of requiring significance at the 0.05 level, you require significance at the 0.05/#tests level). So it the final test shows a p-value such as 0.000001, you probably are on safe grounds.

Another approach is to a priori decide that some tests are obvious (confirmatory) and some are just searching around the data (exploratory). Then you can demonstrate the confirmatory results, while labeling anything interesting among the 'exploratory' results as 'needs further research'. That is, you can mix well-founded tests with 'data dredging' as long as you acknowledge the difference between the two sets of tests.

But if it isn't possible to rescue the result, I'd go with insisting that they describe what they did, with the comment that if they are embarrassed to describe it, they shouldn't have done it. :)

You might also add that it is often obvious (at least to statisticians) that a researcher has pulled this trick. When we see a test in isolation that would not occur to us to be the obvious approach, or a hypothesis that we'd not choose a priori, it looks suspicious. For instance, I recently read a paper that claimed that a certain group of people tend to commit suicide more often if they were BORN in the Spring. It was clear that JUST testing the effect of birth in Springtime was not something that would occur to anyone, without testing the effect of birth in other seasons. So they probably had a spurious result due to multiple comparisons.


This is an excellent question. I do think you (and others in similar situations) should speak up, but I realize this is very difficult to do. Two things I'd suggest:

  1. Try to figure out if the people you're dealing with understand that the methods they're proposing (p-hacking, etc.) are dodgy or not -- i.e. whether it's an issue of ethics or ignorance. This is harder than it may seem, since I think many people genuinely don't understand how easy it is to find patterns in noise, and how "researcher degrees of freedom" make spurious patterns easy to generate. Asking people, non-confrontationally, to explain how doing tests on "every possible specification of a dependent variable" and selecting those with "p<0.05" corresponds to <5% of "random" datasets having a feature of interest would make this clearer, and would perhaps give you insight on the question of ethics or ignorance. I'd bet that a good fraction of people aren't deliberately unethical, but their cloudy grasp of quantitative data obscures ethical thinking.

  2. Something I've found helpful in related contexts is to generate simulated data and actually show the principle that you're arguing. For example, generate datasets of featureless noise and show that with enough variables to compare between, one can always find a "significant" relationship. (Obviously, without correcting for multiple comparisons.) It may seem strange, but seeing this in simulated data seems to help.

Good luck!


Kenji, For the last few years, I have given a continuing education course called Common Mistakes in Using Statistics: Spotting Them and Avoiding Them. I hope that some of the approaches I have taken might be helpful to you in convincing your colleagues that changes are needed.

First, I don't start out saying that things are unethical (although I might get to that eventually). I talk instead about mistakes, misunderstandings, and confusions. I also at some point introduce the idea that "That's the way we've always done things" doesn't make that way correct.

I also use the metaphor of "the game of telephone" that many people have played as a child: people sit in a circle; one person whispers something into the ear of the person next to them; that person whispers what he/she hears to the next person, and so on around the circle. The last person says what they hear out loud, and the first person reveals the original phrase. Usually the two are so different that it's funny. Applying the metaphor to statistics teaching: someone genuinely is trying to understand the complex ideas of frequentist statistics; they finally believe they get it, and pass their perceived (but somewhat flawed) understanding on to others; some of the recipients (with good intentions) make more oversimplifications or misinterpretations and pass them on to more people -- and so on down the line. Eventually a seriously flawed version appears in textbooks and becomes standard practice.

The notes for my continuing ed course are freely available at http://www.ma.utexas.edu/users/mks/CommonMistakes2015/commonmistakeshome2015.html. Feel free to use them in any way -- e.g., having an informal discussion seminar using them (or some of them) as background reading might help communicate the ideas. You will note that the first "Common mistake" discussed is "Expecting too much uncertainty." Indeed that is a fundamental mistake that underlies a lot of what has gone wrong in using statistics. The recommendations given there are a good starting point for helping colleagues begin to see the point of all the other mistakes.

The course website also has links to some online demos that are helpful to some in understanding problems that are often glossed over.

I've also done some blogging on the general theme at http://www.ma.utexas.edu/blogs/mks/. Some of the June 2014 entries are especially relevant.

I hope these suggestions and resources are helpful. Feel free to contact me if you have any questions.