Understanding A/B Testing Report

Topics

Define the key testing metric
Understand Uplift and Probability to Be Best
When will a winner be declared?
The estimated key testing metrics
Breakdown analysis
What if there’s no winner of my testing?

A/B testing experiences contain different creative versions allocated to different traffic to test which version performs better to reach your goal. You can view the A/B testing report by clicking on “A/B Testing Analysis”.

Define the key testing metric #

You may have multiple goals to evaluate your testing. Define a key metric on the top-left first and then the whole report will be generated based on the key metric you defined.

Understand Uplift and Probability to Be Best #

Before analyzing a report, it is important to understand Uplift and Probability to Be Best.

Uplift
Uplift is the difference between the result of a version and the baseline version. The baseline version usually is the control group or the first version when you don’t have a control group.
For example, if your key goal is the purchase, the purchase rate of one version is 5%, and the control group is 4%, the uplift is 25%.

Probability to Be Best
Probability to Be Best is the chance of a version performing better than all other versions in the long term. This is the most actionable metric in the report, which is the key to define the winner of the testing. The probability to Be Best takes sample size into account to make sure the result is reliable based on the Bayesian approach.

When will a winner be declared? #

A winner will be declared at the top of the report if the following conditions are met:

1A version has a Probability to Be Best score above 95% by default. This threshold can be changed using the Winner Probability to Be Best Score setting in your experience settings.
2The minimum testing duration has passed. This is to make sure the conclusion is not made based on any time-related traffic. This minimum duration can be changed using the Minimum Testing Duration setting in your experience settings (default is 7 days).
3At least 100 viewed users and 30 goal reached users for each version. If the key testing metrics are Avg. Visit Duration, Avg. Pages per Visit or Bounce rate, a winner will be declared after at least 100 viewed users.

The estimated key testing metrics #

The estimated goal rate means the goal reached rate you can expect to see over the long run based on current data. We provide 95% probability, 50% probability and the best estimate of the goal rate for each version. As the image shown, the goal rate of this version has 95% probability is between 1% and 3%, and 50% probability is between 1.5% and 2.5%, and our best estimate for this version is 1.9%.
The chart below shows the probability distribution of the goal rate. The blue line represents the selected version, and the black line represents the baseline version. When hovering over the line you can see how the goal rate distributes for each version. At the top of the curve is the goal rate with the highest probability.

How to view the probability distribution chart?

1The curve on the right is better than the curve on the left, and the smaller the overlap, the larger the gap between the two versions. One exception is bounce rate, since bounce rate is a negative metric, the curve on the left is better than the curve on the right.
2The sharper the curve, the more certain the estimation is. Mostly, the more sample data you have, the sharper the curve.

Breakdown analysis #

A good way to dig deeper is to break down the result into different properties. This may lead to interesting insights.

Based on your business you can check testing results for different pages that display the testing experience, or for users with different sources, devices, user types, or locations. If you upload user data by Identify Functions, you can also check the testing results for different user properties you uploaded such as user ages, membership level, or industries.

You may find different insights for different breakdown properties. For example, users from campaign X perform better on version A, while users from campaign Y perform better on version B. This may be a good personalization opportunity for you not to send all traffic to a specific version, instead, you can send different versions to different users, which will lead to a higher conversion overall.

What if there’s no winner of my testing? #

If there’s no winner has been declared for a long running time. You can try to:

1Explore breakdown results
Since different people may prefer different versions, using breakdown analysis can help dig deeper into the result to find out if any users perform better on a specific version. For example, you might see one version is better for users from the Facebook campaign and the other is better for users from the YouTube campaign, however, the overall result does not have a clear winner. When you find some insights like this, you can send different versions to different users to get a higher conversion.
2View heatmaps of each version
You can view heatmaps of each version and see how users interact with the elements that relate to your testing. Although there’s no winner based on the key metrics you select, you still can find some insights such as if users pay more attention or interact more on a specific version.
3Set the goal to the closest to where you test
You may set the ultimate goal of your website as the goal of your campaign which is far from where you test. For example, for an EC site, you add a small banner to your product category page to help users understand this category’s advantages. You may set “purchase successful” as the final goal of your test, however, there are many other factors to affect a user’s purchase from category page to purchase, so you may not see a winner. You can try to add goals like “click to product details” or “add to cart” which is closer to the position you test and see if any version performs significantly better.