In a standard statistical test, one assumes a null hypothesis, performs a statistical test and computes a p-value. The estimated p-value is compared to a predetermined threshold (usually 0.05). If the estimated p-value is greater than 0.05 (say 0.2), it means that there is a 20% chance of obtaining the current result if the null hypothesis is true. Since we decided our threshold as 5%, the 20% is too high to reject the null hypothesis and we accept the null hypothesis. Now, if the estimated p-value was less than 0.05 (say 0.02), there is a 2% probability of obtaining the observed result if the null hypothesis is true. Since 2% is a very low probability and it is below our threshold of 5%, we reject the null hypothesis and accept an alternative hypothesis.
The 5% threshold, although giving us high confidence, is an arbitrary value and does not absolutely guarantee an outcome. There is still the possibility that we are wrong 5% of the time. This is known as the probability of a Type I error. A Type I error occurs when a researcher falsely concludes that an observed difference is real, when in fact, there is no difference.
That was the story of a single statistical test. With large data, it is common for data analysts to do multiple statistical tests on the same data. Similar to a single test, each test in a multiple test has the 5% Type 1 error rate. And this accumulates for the number of tests.