In statistics, why do you reject the null hypothesis when the p-value is less than the alpha value (the level of significance)

Here's the idea: you have a hypothesis you want to test about a given population. How do you test it? You take data from a random sample, and then you determine how likely (this is the confidence level) it is that a population with that assumed hypothesis and an assumed distribution would produce such data. You decide: if this data has a probability less than, say $95$% of coming from this population, then you reject at this confidence level--so $95$% is your confidence level. How do you decide how likely it is for the data to come from a given population? You use a certain assumed distribution of the data, together with any parameters of the population that you may know.

A concrete example: You want to test the claim that the average adult male weight is $170 lbs$ . You know that adult weight is normally-distributed, with standard deviation, say, 10 pounds. You say: I will accept this hypothesis, if the sample data I get comes from this population with probability at least $95$% . How do you decide how likely the sample data is? You use the fact that the data is normally-distributed, with (population) standard deviation=$10$, and you assume the mean is $170$ . How do you determine how likely it is for the sample data to come from this population: the $z-$ value you get ( since this is a normally-distributed variable , and a table allows you to determine the probability.

So, say the average of the random sample of adult male weights is $188lbs$. Do you accept the claim that the population mean is $170$? . Well, the decision comes down to : how likely (how probable) is it that a normally-distributed variable with mean $170$ and standard deviation $10$ would produce a sample value of $188lb$? . Since you have the necessary values for the distribution, you can test how likely this value of $188$ is, in a population $N(170,10)$ by finding its $z-$ value. If this $z-$ -value is less than the critical value, then the value you obtain is less likely than your willing to accept. Otherwise, you accept.


You can reject whatever you want. Sometimes you will be wrong to do so, and some other times you will be wrong when you fail to reject.

But if your aim is to make Type I errors (rejecting the null hypothesis when it is true) less than a certain proportion of times then you need something like an $\alpha$, and given that approach if you want to minimise Type II errors (failing to reject the null hypothesis when it is false) then you need to reject when you have extreme values of the test statistic as shown by the $p$-value which are suggestive of the alternative hypothesis.

As you say, $0.05$ is an arbitrary number. It comes from RA Fisher, who initially thought that two standard deviations was a reasonable approach, then noted that for a two-sided test with a normal distribution this gave $\alpha \approx 0.0455$, and decided to round it to $0.05$.


Think of it this way: in hypothesis testing, the question we always want to answer is "Is this phenomenon that I've measured really there, or is the data suggesting it just a coincidence?" Of course it's never possible to completely rule out coincidence; the best you can ever hope for is to say "This is probably not a coincidence, because the chance of something like this happening just by chance is less than ______."

When you choose a significance level, you're filling in the blank space in the previous sentence. You are deciding just how unlikely a coincidence needs to be before you are willing to decide that there is really something going on.