When someone experiments, they need a tool to verify the relevance of their results. Null Hypothesis is one of the tools we use in research psychology. Null Hypothesis (H0) assumes that the two possibilities are the same, i.e., the observed difference is due to chance alone. We then use a statistical test to determine the likelihood of the null hypothesis being true.
In a statistical test null hypothesis (H0) is compared with the alternate hypothesis (H1), and on this basis, we reject or accept the null hypothesis. Both the null and alternative hypotheses are conjectures on the statistical models of the population being studied. The statistical model, in turn, is made based on the sample of the population. These tests are used everywhere in science, from particle physics to drug development. They separate actual results from the noise; with them, it would be easier to do science properly.
In a statistical significance test, the statement being tested, the null hypothesis, is tested against the alternative hypothesis. The test is designed to assess the strength of the evidence against the null hypothesis. Usually, the null hypothesis assumes no difference. For example, if we compare the heights of women from different regions, say India and Netherlands, the null hypothesis assumes that the average height of women is the same in both regions. In a test of statistical significance, we take a random sample of the population being studied. We assume that the null hypothesis is true. We calculate what the result would look like if this were the case, and then we compare this with the actual result. We reject the null hypothesis if the difference between the observed and theoretical data is statistically significant.
The probability that we will get the same result as the sample if the null hypothesis is true is called the p-value. Finding the p-value is crucial in testing the null hypothesis. If the p-value is low, the result is unlikely if the null hypothesis is true and vice versa.
Even when we fail to exclude the null hypothesis, it does not mean it is true, and it might be that the measurement was faulty or the sample was biased. The result means that there is not enough evidence to reject the null hypothesis, which means that better data might fail to reject the null hypothesis.
The null hypothesis significance test is a fusion of two powerful but opposed schools of thinking in modern statistics. Fisher devised a mechanism for generating significance levels from data, but Neyman and Pearson propose a strict decision process for confirming or rejecting defined a priori hypotheses. Except as a reaction to Bayesianism, the null hypothesis significance testing process is unaffected by the third main intellectual stream of the period.
Before they were entangled in modern-day NHST, the scientific utility of error rates and the presumed evidentiary significance of p values were contentious problems. Fisher and, notably, Neyman debated passionately, often acrimoniously, and never reconciled their different viewpoints. Neyman-Pearson's model is considered theoretically consistent and is widely accepted in mathematical statistics as "frequentist orthodoxy." However, theoretical clarity appears to come at the expense of limited value in the practical scientific effort. The emphasis on decision criteria with reported error rates in indefinitely repeated trials may be suitable for quality control in industrial settings. However, it appears less relevant to scientific hypothesis evaluation, as Fisher mockingly observed (Fisher 1955).
Although Fisher first proposed a 5% "significant threshold" for his "significance tests," he eventually objected to Neyman-dogmatic Pearson's binary choice rules based on a predefined level, emphasizing that it was naive for scientific purposes. As a result, in later articles, he proposed that exact p values should be provided as evidence against H0 rather than making split-second rejection choices (Fisher 1956). On the other hand, the supposed 'objective' evidential nature of p values was questioned early on. Fisher's attempt at refutation of H0 based on 'inductive inference' is generally considered logically flawed, particularly because p values only test one hypothesis and are based on tail area probabilities, which was considered a serious deficiency early on.
Statistical Significance does not mean practical significance. A result might be statistically significant but be of no use. For example, a new expensive medicine that works better than a placebo but other cheaper therapies that offer similar benefits might already exist. So this result is statistically significant but is of no practical significance. We also cannot prove the null hypothesis suggested by the data, as this is circular reasoning and cannot prove anything.
Almost all experimental studies, if not all of them, include the null hypothesis. Using confidence intervals that directly evaluate how good a sample mean is as an estimate of the corresponding population means is one alternative gradually emerging within several null hypothesis significance testing-heavy sciences and one common in the natural sciences to counter its limitation.