FAQ collections for A/B testing

Topics

Q: What conditions must be met for an A/B test?
Q: How much traffic is needed for a test?
Q：In Ptengine's A/B testing, what is the definition of "goal completion"? How is it determined?
Q: When an A/B test has multiple goals and the test results for each goal differ, how should they be interpreted?
Q: When can a test be concluded?
Q: What should be done if the test duration is sufficient, but no winning result is achieved for any goal?
Q: In e-commerce scenarios, besides "Successful Purchase," "Checkout," and "Add to Cart," are there other metrics to consider for A/B testing?
Q: Can multiple tests be run on a single page?

Q: What conditions must be met for an A/B test? #

The following conditions must be met to declare a winning version:

Sample Size: Each version must have at least 100 impressions and 30 goal completions.
Test Duration: The test period needs to be at least 7 days (default). This is to ensure the conclusion is not based on any time-related traffic fluctuations.
Default Winning Condition: By default, a version's win rate (probability of being superior) must be higher than 95%. (You can change this threshold using the "Winner Superiority Rate" in the experience settings.)

Specific Scenario Example Question:

If the key test metric is "successful purchase," and after running A/B versions for a week, each has only three to five goal completions, can this data still determine if the test is valid?

👉🏻 For a test to draw conclusions, the minimum sample size prerequisite must be met (i.e., at least 100 impressions and 30 goal completions per version). If the sample sizes are too disparate, a valid conclusion cannot be drawn. Suggestions:

Extend the test duration, switch to a page with more traffic, or increase traffic volume before testing.
If there are multiple test goals, first refer to the metrics that have met their target numbers.
- Regarding the percentage increase of the goal, can an increase of 1% or 2% be considered valid? Does the increase need to be higher to be considered valid?

👉🏻 Whether the test results are valid first depends on meeting the aforementioned winning conditions. When both sample size and test duration are met, the main factor is the win rate. By default, a version wins if its win rate is above 95%. You can also customize the win rate based on business needs. When the win rate meets expectations, any percentage increase, regardless of how small, is considered a valid improvement.

Q: How much traffic is needed for a test? #

It depends on:
- Conversion rate
- Desired lift (improvement margin)
- Statistical power
- Confidence level
Before deciding to start a test, at least estimate if the cost (in terms of traffic/time) is acceptable by considering these factors (Reference: https://abtestguide.com/abtestsize/)

Q：In Ptengine's A/B testing, what is the definition of "goal completion"? How is it determined? #

If, within a single session, a user visits the page where the test is active, a version is displayed, and a conversion occurs within that session, it will be counted towards that version's "goal completions." (If no conversion occurs in that session, but a conversion happens in a subsequent visit without re-visiting the test page, that conversion will not be counted.)

Q: When an A/B test has multiple goals and the test results for each goal differ, how should they be interpreted? #

In e-commerce, the most common test goals are conversion-related metrics like "Add to Cart," "Checkout," and "Successful Purchase."

It's recommended to prioritize metrics for steps closest to the test point in the conversion funnel. For example, if changes are made to a product detail page (PDP) or a feature/campaign page, prioritize the "Add to Cart" test results. "Successful Purchase" is relatively far down the funnel and might be influenced by other factors.
View segmented reports to compare the performance differences of each version across dimensions like device type, source, and user type, to identify the main audience segments causing the differences.
Besides quantitative outcome metrics, you can also use A/B testing heatmaps to compare behaviors like "above-the-fold drop-off rate," "scroll depth," page engagement, and interactions for each version, observing user interest through their browsing behavior.

Q: When can a test be concluded? #

Statistical significance is not a condition for ending the test, but an outcome.
Ending conditions:
- After running through the pre-calculated traffic volume and the predetermined time period (at least one week).
- Rule of thumb: The number of conversions for each version should not be too low.
When the test ends, it's possible to have statistical significance, but you should anticipate that results will often not be significant.

Q: What should be done if the test duration is sufficient, but no winning result is achieved for any goal? #

When both sample size and test duration are met, results are primarily determined by the win rate metric. You can then end the test or adjust the test creative and start a new one. If the win rate is far from 95% (e.g., below 80%), the test result can generally be deemed: not significant. This means the tested element has no direct, stable impact on conversion. If it's below 95% but close, it can be considered as indicating a certain tendency.
When overall result metrics don't show a winner, it's recommended to check segmented reports to see if any specific segments show significant results. (Note: ensure the sample size for segments is not too small.)

Q: In e-commerce scenarios, besides "Successful Purchase," "Checkout," and "Add to Cart," are there other metrics to consider for A/B testing? #

The target metrics for A/B testing mainly depend on the stage affected by the test point. For example, if the test point is on a product detail page, the main focus will be on "Add to Cart," "Checkout," and "Purchase." If the test point is on the homepage, besides conversion metrics, you can also look at "next step" paths related to the test content, such as "Product Detail Page Visits" and "Collection/Category Page Visits." For some test points, clicks on the element/module itself can also be tracked.

Q: Can multiple tests be run on a single page? #

It's recommended to run only one test at a time on a single page for easier attribution.
If traffic is substantial, you can segment the traffic and run different tests concurrently (e.g., simultaneously run different tests for new vs. returning users, or for users from Facebook vs. Google sources).