How would you like to be a modern-day Indiana Jones?
Let’s play the imagination game for a second.
Armed with a fedora, a whip, and an insatiable curiosity, you’re on a quest to unearth treasures from the depths of long-forgotten ruins. There lies a crucial artifact—a relic that holds the key to your success: the concept of statistical significance.
In the world of Conversion Rate Optimization (CRO), the spirit of Indiana Jones lives on.
Much like Indy, brands invest their resources in unearthing the hidden potential of their web pages.
They meticulously test variations, seeking the elusive formula to unlock the treasure trove of conversions and business growth.
In this daring adventure, statistical significance serves as your trusty compass, guiding you through the maze of data, distinguishing between genuine artifacts and mere mirages.
Now, back to the reality of our day-to-day jobs: how do you apply statistical significance in your CRO experiments?
Today’s article sets on a daring adventure in itself: explaining the concept of statistical significance, its importance, its practical application, and its role in hypothesis testing.
Let us journey into the depths of statistical significance, unearthing the priceless gems concealed within your data.
What Is Statistical Significance?
Statistical significance is a concept used by CRO professionals to determine whether or not website updates or changes in a marketing campaign will have a meaningful impact on user behavior or conversion rates.
In simpler terms, statistical significance answers an important question: “Is our result due to chance, a random occurrence, or is it because of an actual change we made?”
Now, for the more technical explanation of the concept.
You would typically test two versions when you conduct an experiment (either for the website or other processes).
- the control version (the original)
- and the variant version (the one with changes).
After establishing the two variants, the next step is directing a percentage of your target audience to each version, then collecting data against a specific metric (such as the conversion rate).
The experiment will end with a result – declaring either the control version or the variant as winning.
Statistical significance tells you if the results are accurate or occurred by chance.
The bottom line is that statistical relevance will help you determine if the observed result is reliable and not just a fluke.
What Is a Significance Level?
In A/B testing, the significance level is a predetermined threshold to determine statistical significance. Think of significance levels as statistical significance tests.
Essentially, the significance level is the criteria you use to accept or reject the null hypothesis.
The significance level is also known as the alpha level or critical value, typically noted by the Greek letter alpha (α), and is expressed as a probability value.
The most common significance level used in A/B testing is 0.05 (or 5%).
Suppose the p-value calculated from the test is below 0.05. In that case, the result is considered statistically significant, and changes can be confidently implemented.
Why Is Statistical Significance Important?
The tl;dr version is that statistical significance plays a vital role in providing reliable and meaningful insights, significantly reducing the margin of error of your tests.
We can’t even explain statistical significance’s importance without mentioning its most common benefits, including:
- More confidence in your decision-making process. It’s easy to make a tough decision based on data.
- Reducing the possibility of false positives or false negatives obscuring your A/B test results.
- A more effective resource allocation toward optimizing your processes and obtaining significant results.
- More efficient reporting and communication, sprinkled with the certainty that your initiatives are backed by vigorous research and experimentation.
And the conversation can only continue by looking at the consequences of hypothesis testing that doesn’t pass the significance threshold.
One of the primary dangers is the increased likelihood of making incorrect decisions based on test results.
Since the sole purpose of A/B testing is creating effective processes with high conversion rates, flawed test results lead to implementing changes that may not actually significantly impact your rates.
In other words, you’re wasting resources, time, and effort on updates that do not deliver the expected results.
Another prevalent issue of skipping significance testing is an increased potential for false positive and negative errors. This means you may confuse a result influenced by chance with a statistically relevant result, leading, again, to flawed decisions.
And the problems don’t stop here.
Statistical significance confirms that your tests delivered a statistically significant result applicable to your broader audience – not limited to your sample.
(In statistics, the sample vs. broad audience difference is represented by confidence intervals.
For example, a 95% confidence interval indicates that should you repeat the process multiple times, approximately 95% of the resulting intervals would apply to the whole observed effect sizes.)
On the other hand, neglecting statistical significance leads to a failure to identify changes more likely to have a genuine impact.
In the long term, you can only optimize effectively and achieve desired outcomes with statistical significance.
You will, unfortunately, make incorrect decisions, inefficiently distribute your resources, and even lose the respect of your peers.
Indeed, that’s not an outcome you’re chasing.
Embrace the confidence of having statistically relevant results.
Make more informed decisions. Innovate successfully.
Like what you're reading?
Join the informed eCommerce crowd!
Stay connected to what’s hot in eCommerce.
We will never bug you with irrelevant info.
How Is Statistical Significance Determined?
So, how can you ensure the results of your experiments are accurate and aren’t influenced by external factors?
Here’s the step-by-step process used to calculate statistical significance:
- Formulate your hypotheses.
Start by defining the null hypothesis (H0), which assumes no significant difference or effect between the control and variant groups.
The alternative hypothesis (H1) asserts that there is a significant difference or effect.
- Choose a significance level.
Select a predetermined significance level – the threshold at which you consider the results statistically significant.
The most common significance level is 0.05 (or 5%). However, you can change it to match your specific context and desired balance between accuracy and sensitivity.
- Select an appropriate statistical test.
Choose the statistical test that suits your data and research question.
For example, a proportion or chi-square test might be appropriate when comparing conversion rates between two groups.
- Collect and analyze data.
Gather data from the control and variant groups, including relevant metrics or outcomes. Perform the statistical analysis using the chosen test.
- Calculate the test statistic.
Compute the test statistic based on the collected data.
(we’ll also discuss this subject soon in the article – but here’s a complete rundown of the concept, should you need further info.)
- Determine the p-value.
Calculate the p-value using the test statistic.
- Compare the p-value to the significance level.
Let’s say you run an A/B test on a website to compare the performance of two different landing page versions.
After analyzing the data, you calculate a p-value of 0.03. Let’s assume you had set a significance level of 0.05 before conducting the test.
In this case, comparing the calculated p-value (0.03) with the chosen significance level (0.05), you find that the p-value is lower than the significance level.
This means that the difference in conversion rates between the two versions is statistically significant.
If the situation were different and the p-value would have been higher, then you would have to reject the null hypothesis and conclude that the difference is unlikely due to chance alone.
- Interpret the results
Interpret the results based on statistical significance.
As you saw in our previous example, if your results are statistically significant, then the difference between the two versions is likely a result of the actual changes you made in the variation.
If the results are not statistically significant, it implies that the observed difference could reasonably occur due to random chance.
Types of Significance in Research
Regarding the research process, you should be aware of different types of significance for which you need to double-check your test results.
However, the classification of these types of significance is merely connected to the external factors affecting your tests and experiments rather than describing more technical differences.
- Statistical Significance
The most fundamental and widely used in A/B testing, statistical significance determines whether the observed difference in metrics between the control and variant is statistically significant or likely due to random chance.
It helps to establish the reliability and validity of the results. Statistical significance is typically assessed using hypothesis testing and p-values.
- Practical Significance
Unlike statistical significance, which focuses on random occurrences, practical significance examines whether the observed difference is practically meaningful or relevant in the real world.
For example, suppose your A/B test revealed that the variation in the checkout page shows a 15% increase in the CR versus the control version.
In this case, the practical significance would be that implementing the variation will increase your overall revenue and profitability due to a higher conversion rate.
This finding directly impacts your business and guides your decision to adopt the variation on your checkout page.
- Business Significance
This last type goes beyond statistical and practical considerations, focusing on the business as a whole.
Analysts must evaluate the financial, strategic, or operational implications of the observed difference in A/B testing to determine the business significance.
For instance, imagine you’re conducting an A/B test for your creatives: testing version A and version B regarding conversion rates.
After running the ads for a month, you look at the results: version B resulted in a 2% higher click-through rate compared to version A. Should you continue to run version B for other campaigns?
The business significance for this specific scenario lies in the potential impact of version B.
While a 2% increase in click-through rate may seem small when you consider the scale of your advertising efforts (such as targeting a larger audience or running the campaign for an extended period), it can translate into a substantial impact.
So, the natural next step is to allocate more resources to the ad version that performs better, optimize your campaigns, and potentially achieve a higher ROI.
Business significance is constructive in providing you with the necessary context to prioritize and allocate resources based on the potential business impact of your test results.
Statistical Significance in Hypothesis Testing
Statistical significance is crucial in evaluating the evidence against the null hypothesis (H0) and determining whether to accept or reject it or reject the alternative hypotheses instead.
Here’s how to use statistical significance when hypothesis testing:
- Formulate the Hypotheses
In hypothesis testing, you must formulate two hypotheses: null and alternative.
The null hypothesis assumes no significant effect or difference. In other words, you hypothesize that the changes on the variation version won’t impact user behavior (related to a strict CRO scenario.)
The alternative hypothesis suggests a significant effect or difference in the variation version.
Meaning you hypothesize that a change in the variation version of the page you’re testing will influence user behavior and impact your metrics.
- Choose a Significance Level
As we discussed earlier, the significance level is the threshold at which you consider the results statistically significant.
The most common significance level is 0.05 (or 5%).
This means your result is statistically significant if the probability of observing the data is less than 5% under the null hypothesis.
- Collect and Analyze Data
Gather data relevant to the hypothesis being tested.
This means allowing your test to run and ensuring both versions of the page you’re testing get the same amount of traffic.
- Calculate the Test Statistic
A test statistic quantifies the difference between observed data and what is expected under the null hypothesis.
The test statistic is compared to a critical value based on the standard deviations associated with the frequency distribution of the test statistic to assess statistical significance.
You can often simplify your calculation, assuming the data follows normal probability distributions.
For instance, when evaluating the effectiveness of a new survey template, descriptive statistics such as means and standard deviations can be computed to summarize the data.
- Determine the p-value
The p-value quantifies the strength of evidence against the null hypothesis.
A lower p-value indicates stronger evidence against H0.
- Compare the p-value to the Significance Level
The observed difference is statistically significant if the p-value is lower than the significance level (e.g., p < 0.05).
When this happens, the null hypothesis is rejected since evidence favors the alternative hypothesis.
- Interpret the Results
If the results are statistically significant, the observed difference is unlikely due to random chance alone, supporting the alternative hypothesis.
If the results are not statistically significant, then the observed difference could reasonably occur due to chance, and there is insufficient evidence to reject the null hypothesis.
In this case, you would have to repeat the experiment changing the parameters, or even reject the alternative hypothesis.
You can safely tuck away your whip and fedora – the adventure is over. What did we learn?
Statistical significance plays a pivotal role in hypothesis testing and data analysis, and it’s aiding in helping CRO professionals make accurate and impactful data-driven decisions.
Statistical significance also gives you the confidence that your test results aren’t caused by chance but instead, reflect real and meaningful patterns.
Understanding the concept helps you make data-driven decisions, optimize processes, and drive business progress and sure-fire growth.
Embracing statistical significance means enhancing the quality of your research, fostering innovation, and paving the way for new insights and discoveries about your business.
Happy testing, and may the odds be ever in your favor!
Frequently Asked Questions about Statistical Significance
Statistical significance refers to the likelihood that the observed difference between variables in an A/B test or Experiment is not due to random chance.
It provides a measure of confidence that the observed findings are meaningful and not a result of random fluctuations in the data.
A p-value is a statistical measure that quantifies the strength of evidence against the null hypothesis.
It represents the probability of obtaining results as extreme as or more extreme than the observed data, assuming the null hypothesis is true.
A smaller p-value suggests stronger evidence against the null hypothesis and indicates a higher level of statistical significance.
If a result is statistically significant, it means that the observed difference or relationship in the data is unlikely to have occurred by chance alone.
It indicates that there is evidence supporting the presence of a real effect or relationship between variables, rather than being a random fluctuation or coincidence.
Statistical significance focuses on the presence or absence of a statistically meaningful difference or relationship in the data, whereas practical significance considers the real-world importance or magnitude of the observed effect.
A result can be statistically significant but not practically significant if the observed difference, while statistically meaningful, is too small to have a substantial impact or practical value in a specific context or application.