Skip to content
  • There are no suggestions because the search field is empty.

Understanding Observed Power in Convert Experience Reports

🔬 A Guide to Understanding the Role and Limitations of Observed Power in A/B Testing with Convert Reports

IN THIS ARTICLE YOU WILL:

Observed Power, sometimes called "post hoc power" or "retrospective power," is a metric that attempts to estimate how likely your test was to detect a real difference between variations, based on the data observed after your test has concluded.

Convert has introduced Observed Power in its reports for transparency and completeness, but it is important to interpret this metric with caution.

🔍 What Is Observed Power?

Observed Power is a calculation based on your test's:

  • Observed effect size
  • Sample size (number of visitors)
  • Variance in the data
  • Confidence level used in the test

It estimates the probability that the test could detect the observed effect size if the test were repeated under similar conditions. In simple terms, it tries to answer the question: "Given the test results, how likely was it that we could detect an effect if one truly existed?"

📊 Where Is Observed Power Displayed?

Observed Power is not displayed by default in the Convert Experience Report.

  • You can choose to display it manually by selecting it in the report table.
  • When shown, it appears with an information icon.
    op1

    op2
  • This ensures users understand its advisory role in decision making.

🔢 How to Interpret Observed Power

  • High Observed Power (> 0.80): Suggests the test had sufficient sample size and a strong enough effect size to confidently detect differences, if any.
  • Low Observed Power (< 0.80): Suggests the test may not have had enough data to detect an effect, even if one existed. This often results from:
    • High variance in the data
    • Small sample sizes
    • Small actual effect sizes

⚠️ Why Caution Is Important

Statistical experts warn against over-relying on Observed Power because:

  1. It's Redundant: Observed Power is a direct mathematical function of the p-value. Interpreting both together adds no new insight.
  2. It Can Be Misleading: It may wrongly indicate that a statistically significant result is untrustworthy just because observed power is low.
  3. It Encourages Peeking: Using observed power to decide whether to extend a test introduces a high risk of inflating the false positive rate (Type I error).

Using observed power as a decision-making tool can lead to practices that damage the statistical integrity of your experiments.

For a detailed critique, see this in-depth analysis.

📘 Best Practices

  • Do not require high Observed Power to validate a statistically significant result. A significant result is already enough to reject the null hypothesis under standard conditions.
  • Use Observed Power as an optional, informational metric—not a determinant for test validity.
  • Consider Variance: If you must interpret Observed Power, also examine the variance of the metric being tested, as high variance can suppress observed power despite a large sample size.

Use Smart Recommendations: Convert’s Smart Recommendations will guide you on whether to extend or conclude a test based on reliable indicators like significance, lift, and improvement.

📋 Summary

Observed Power is now available in Convert Experience Reports to enhance transparency. However, it should be used with a clear understanding of its limitations and risks.

Use it as a supplementary tool, not a guiding metric.