Statistical Methods Used

🚀 THIS ARTICLE WILL HELP YOU:

Understand Frequentist
- Understand power calculations
Know about sequential testing
Understand Bayesian

Convert supports two distinct statistical approaches for your reports: Frequentist and Bayesian.

📏 Frequentist

Important Update: As of October 13th, 2023, all new experiences utilize t-tests instead of z-tests. Experiences initiated prior to this date employed z-tests for statistical significance.

For our updated Frequentist stats, we utilize t-tests. You have the flexibility to configure the following parameters:

The confidence level. Typically, a 95% confidence level is used for most experiments. However, if your experiment is of critical importance, selecting a 99% confidence level would be prudent.
The test type is either a two-tailed or one-tailed t-test. The default is now set to "two-tailed" testing to align with standard statistical practices and ensure greater reliability. Two-tailed tests are recommended by default as they provide a more conservative and statistically robust evaluation. However, if your experiment prioritizes speed over rigor, you may choose a one-tailed test manually.
Concerning multiple comparison correction techniques, you can choose from Bonferroni, Sidak, or None. From a statistical robustness perspective, Sidak stands out as the optimal choice, especially for mission-critical experiments. It maintains the family-wise error rate without adversely affecting the test's power. Nevertheless, the choice is yours.
Additionally, a Sensible Defaults menu is available. It allows you to quickly set "preferred" parameter values based on the criticality of your test—be it "standard" or "mission-critical."

🗒️ Note: This menu now reflects "two-tailed" as the default test type for new projects and experiences.

📈 Power Calculations

We support two power calculation modes that directly influence test completion automation:

Dynamic: In this mode, we use the observed lift as the Minimum Detectable Effect (MDE) to compute the estimated test progress. This is the default setting and may cause the estimated progress to vary based on fluctuations in the lift, especially at the start of the test.
Fixed: This mode is equivalent to standard fixed-horizon test planning. Here, you can set your target MDE, and the test progress will be calculated accordingly.

🔄 Sequential Testing

Our sequential testing employs Asymptotic Confidence Sequences for analysis, a method grounded in the advancements by Waudby-Smith et al. (2023). This approach shares similarities with the Generalized Anytime Valid Inference confidence sequences introduced by Howard et al. (2022). These methodologies offer robust solutions for handling data analysis in a dynamic and continuous manner, providing significant advantages over traditional testing approaches.

🧩 Key Features:

Flexibility with Continuous Monitoring: One of the principal benefits of using confidence sequences is their ability to accommodate continuous monitoring of the experiment's data. This feature allows for evaluating the results of A/B tests at any point in time, enabling decisions to be made as soon as sufficient evidence is gathered, all while maintaining control over error rates.
Designed for a Variety of Settings: The method is versatile and suitable for a wide range of experimentation scenarios, catering to different needs and user cases. It provides experimenters with the tools necessary to tailor the testing process to their specific requirements.
Adjustability with Tuning Parameter: The tuning parameter is pivotal in configuring the tightness of the confidence sequences. It can be adjusted according to the anticipated decision-making point regarding sample size, thus balancing the benefits of early decision-making against the risk of premature conclusions.
Control Over False Positive Rates: Confidence sequences effectively address the "peeking problem" commonly associated with interim analysis in A/B testing. This method controls the false positive rate despite frequent data checks, an achievement not possible with standard fixed-sample testing without specific adjustments.

⚖️ Comparison with Other Methods:

Against General Fixed-Sample Testing: Unlike fixed-sample testing, which requires waiting until a predetermined sample size or duration has been reached, confidence sequences facilitate a more dynamic and responsive decision-making process. This approach aligns well with environments that demand rapid iteration and real-time data analysis.

🛠️ Implementing Sequential Testing in Our Tools:

Sequential Tuning Parameter: The minimum number of visitors required for the test is utilized as a crucial tuning parameter in sequential testing. This setting is vital for controlling the statistical thresholds, ensuring that decisions made at any point are as reliable as those at the conclusion of the experiment.
Adjusting the Tuning Parameter: Modifying this parameter affects the speed at which significant results can be detected. Increasing the parameter enhances the confidence in early results by demanding more data for declaring significance. This adjustment is especially valuable for critical decisions. Conversely, reducing the parameter can accelerate decision-making, which is beneficial in fast-paced environments where quick actions are essential.
Optimal Settings: The default setting for the sequential tuning parameter is often placed at 5,000, based on historical data and average traffic to ensure a balance between responsiveness and rigor. However, it can be adjusted to better match the specifics of an individual experiment, depending on expected data variability and the critical nature of the test outcomes.

By integrating sequential testing into our suite of analytical tools, we provide a method that supports flexible and timely decision-making while upholding the stringent standards necessary for sound statistical analysis.

🧪 Bayesian

The Bayesian approach offers a fundamentally different way of interpreting A/B test results compared to frequentist methods. Instead of asking "Is there a statistically significant difference?", Bayesian analysis answers "What's the probability that Variant A beats Control?" This provides more intuitive and actionable insights for business decision-making.

🎯 Decision Threshold: Understanding "Chance to Win"

The decision threshold represents the minimum probability you require before concluding that a variant is truly better than the control. This threshold directly translates to your confidence level in making business decisions.

How it Works:

95% Threshold (Default): You need 95% confidence that the variant beats control before declaring it a winner
99% Threshold (High Certainty): Recommended for mission-critical changes
90% Threshold (Faster Decisions): For environments where speed is more important than certainty

Example:
If your test shows "Variant A has a 96.5% chance to beat Control" and your threshold is 95%, you can confidently implement Variant A. If it's only 93%, keep testing until you reach the threshold.

🚨 Understanding Risk Percentage

95% Threshold = 5% Risk
99% Threshold = 1% Risk
90% Threshold = 10% Risk

🧠 Choosing the Right Threshold Based on Business Impact

99% (1% Risk): Core revenue features, security, expensive rollbacks, low-traffic cases
95% (5% Risk): Most standard A/B testing situations
90% (10% Risk): Agile testing, reversible changes, early product development

📊 Posterior Distributions in Action

Convert employs uninformative priors—each variant starts with a 50% win probability. As data accumulate:

Initial State: All variants are equal
Data Collection: Updates posterior estimates
Convergence: Clearer trends emerge
Decision Point: Triggered once a variant surpasses your threshold

Unlike frequentist approaches that simply state "significant or not," Bayesian results provide actual probabilities—enabling nuanced business choices.

🛠️ Practical Implementation Guidelines

Conservative Industries: Choose 99%, accept longer tests for more certainty
Agile/Tech Companies: Use 95% for a speed-accuracy balance
Rapid Experimentation: Use 90% for fast testing and iteration

📈 Monitoring and Interpretation

Bayesian analysis delivers continuous test updates:

Monitor Daily: See probabilities shift in real time
Make Early Decisions: End tests once thresholds are reached
Understand Lift Ranges: Estimate expected uplift
Assess Risk: Evaluate potential regret of wrong decisions

This approach aligns with agile development cycles while maintaining scientific rigor.