Skip to content
  • There are no suggestions because the search field is empty.

Validating A/B Test Data Against Raw Shopify Orders using SQL

Validate Shopify A/B test results with SQL across ShopifyQL, BigQuery, Snowflake, and Metorik.

Author: George F. Crew

IN THIS ARTICLE YOU WILL:

Overview

When running A/B tests on your Shopify store, validating experiment results against your raw order data is a necessary step to ensure accuracy and gain deeper business insights. This guide explains how external tools (like A/B testing platforms, surveys, etc.) append data to Shopify orders and provides generalized SQL templates to segment your sales reports based on this custom data, regardless of whether you are using ShopifyQL, Metorik, a data warehouse (e.g., BigQuery, Snowflake), or another reporting tool.

Understanding Custom Data Storage on Shopify Orders

External tools attach information about a customer's experience (e.g., the A/B test variation they saw) to the final order as Note Attributes (also known as "Order Attributes").

In the underlying Shopify database structure, these attributes are stored in one of two main formats on the Orders table:

Format A: The Combined String/JSON Attribute

The data is stored as a single key-value pair, typically in a column named note_attributes or a similar text field. The key is often generic, and the value is a combined string or JSON object.

  • Key Example: Experiment_experimentid

  • Value Example: 12345:2 (Combining Experiment ID 12345 and Variation ID 2)

Format B: Multiple Discrete Key-Value Attributes

The tool creates a separate key-value pair for each piece of custom data, which is often stored as an ARRAY of STRUCTs (records) in the database.

  • Key Example: experience_1004169502 (The Experiment ID)

  • Value Example: 1004400238 (The Variation ID)

SQL Templates for Data Segmentation

Your SQL strategy depends on which of the above storage formats your data warehouse or reporting tool uses.

Template 1: Querying the Combined String/JSON Attribute (Format A)

If your data is stored as a single combined string or text field, you must use string matching functions (e.g., LIKE, CONTAINS, or REGEXP_CONTAINS) to isolate the orders.

Template 1: Querying the Combined String/JSON Attribute (Format A)

If your data is stored as a single combined string or text field, you must use string matching functions (e.g., LIKE, CONTAINS, or REGEXP_CONTAINS) to isolate the orders.

Clause Purpose SQL Concept
SELECT Standard metrics (e.g., order ID, revenue). order_id, total_price
FROM Your orders data table/view.
WHERE Filter to orders containing the specific variation string. note_attributes CONTAINS 'Experiment_experimentid:12345:2'
Generic SQL Example (Filtering for Variation 2 in Experiment 12345):
SELECT order_id,
       total_price,
       order_created_at
FROM
WHERE        -- Use the appropriate string matching function for your tool        note_attributes LIKE '%Experiment_experimentid:12345:2%'

Template 2: Querying Multiple Discrete Attributes (Format B)

If your data is stored as an Array of Structs (the most common format in modern data warehouses like BigQuery or Snowflake), you need to use the UNNEST or equivalent function to parse the array and expose the key-value pairs as rows.

Clause Purpose SQL Concept
FROM Use the UNNEST function to flatten the attribute array. , UNNEST() AS attr
WHERE Filter by the distinct Experiment Key and Variation Value. attr.name = 'experience_1004169502' AND attr.value = '1004400238'

Generic SQL Example (Grouping Sales by Variation ID): This is the most powerful query for direct test validation.

SELECT attr.value AS variation_id,
       Count(t1.order_id) AS total_orders,
       Sum(t1.total_price) AS total_revenue
FROM AS t1,
   UNNEST(t1.attributes) AS attr -- Use the correct UNNEST syntax for your tool
WHERE
    -- Filter to attributes for your specific experiment
attr.name = 'experience_1004169502'
    -- Optional: Add a date filter for the test period
      AND t1.created_at
      BETWEEN '2025-10-01' AND '2025-11-01'
GROUP BY 1
ORDER BY total_revenue DESC

Adapting the Syntax

To make these templates work with your specific SQL tool, you may need to adjust:

  • Table/Column Names: Replace and column names like note_attributes or attributes with the exact names used in your database schema.

  • String Matching:

    • ShopifyQL: Use CONTAINS.

    • BigQuery: Use LIKE or REGEXP_CONTAINS.

    • PostgreSQL/Snowflake: Use LIKE.

  • Array Parsing:

    • BigQuery/Snowflake: Use UNNEST(column_name).

    • PostgreSQL: This data is often stored as JSONB, requiring functions like jsonb_array_elements(column_name).

By identifying the storage format (A or B) and using the appropriate template, you can accurately and generically segment any Shopify report using SQL.

Less Technical Approach

For a less technical, app-based approach to validating your A/B test data against Shopify orders, we recommend using the dedicated reporting platform Metorik. The Metorik interface automates much of the data segmentation process and, in some cases, provides clean, parsed columns for order attributes, offering a user-friendly alternative to writing custom SQL queries. You can find our step-by-step guide on how to leverage this tool here: How to Validate Convert A/B Testing Data Against Shopify Order Data using Metorik.