The term statistical significance is used in market research to define the probability that a measured difference between two statistics is the result of a real difference in the tested variations and not the result of chance. It means that the result of a test didn’t appear randomly or by chance, but because of a specific change that was tested, so it can be attributed to a specific cause.
The statistical significance gets a lot of attention from marketers and testing professionals. Unfortunately for many marketers, statistical significance has become the main indicator of a completed test. I don’t blame marketers at all; most testing techs make it very clear that once you’ve reached a 95% confidence level, your test is significant and you’re good to go. This is simply just wrong. There are so many factors that impact statistical significance you will undoubtedly draw false conclusions if you only focus on this metric. Hopefully, by the end of this article, you’ll have a better understanding of what statistical significance is, and most importantly what it isn’t. If you can get this concept down, you will be light-years ahead of many marketers today.
What Statistical Significance Is & What It Isn’t
While statistical significance is an important indicator of test validity significance in itself is NOT validity.
Statistical Significance Does Not Necessitate Validity
Validity Does Necessitate Statistical Significance
The relationship between the two is asymmetrical. In other words, if you have a statistically significant test, you might not have a valid test. However, if you have a valid test, you definitely have a significant test. Make sense? Now that we know what statistical significance isn’t, let me tell you what it actually is. Significance is measured by a confidence level and confidence interval.
The confidence level indicates to what percentage your test results won’t commit a type 1 error, the false positive. A false positive occurs when you see a change in your results, but that change is due to randomness (or other noise) and not the change in variations. At a 95% confidence level, you are saying that there is a 5% chance that your test results are the result of a type 1 error. 95% has become the industry standard and should be the minimum confidence level for your tests. Remember, even tests that reach 95% confidence may be erroneous so you should regularly revisit old tests or retest surprising results to verify your findings. The confidence interval is indicated with a ‘±’. When you see a lift of 15% that lift is not static. Your confidence interval will give you the range you should expect, for example, if you have a 15% lift with a ±5 that means your lift could actually be as high as 20% but as low as 10%.
Why You Should Care
Statistical significance is a critical factor in your testing campaigns, but it is not the only factor. Remember during your testing campaign to keep contextualize your numbers and consider things like test length, traffic source, and conversion lift in addition to your confidence level. Happy Testing!