Close ## Multiple Statistical Testing

In the advent of big data such as genomics, running numerous statistical tests is unavoidable. But long comes strange statistical problems. This post investigates issues with multiple statistical testing and its solutions along with simulated data.

In a standard statistical test, one assumes a null hypothesis, performs a statistical test and computes a p-value. The estimated p-value is compared to a predetermined threshold (usually 0.05). If the estimated p-value is greater than 0.05 (say 0.2), it means that there is a 20% chance of obtaining the current result if the null hypothesis is true. Since we decided our threshold as 5%, the 20% is too high to reject the null hypothesis and we accept the null hypothesis. Now, if the estimated p-value was less than 0.05 (say 0.02), there is a 2% probability of obtaining the observed result if the null hypothesis is true. Since 2% is a very low probability and it is below our threshold of 5%, we reject the null hypothesis and accept an alternative hypothesis.

The 5% threshold, although giving us high confidence, is an arbitrary value and does not absolutely guarantee an outcome. There is still the possibility that we are wrong 5% of the time. This is known as the probability of a Type I error. A Type I error occurs when a researcher falsely concludes that an observed difference is real, when in fact, there is no difference.

That was the story of a single statistical test. With large data, it is common for data analysts to do multiple statistical tests on the same data. Similar to a single test, each test in a multiple test has the 5% Type 1 error rate. And this accumulates for the number of tests.