-
Notifications
You must be signed in to change notification settings - Fork 0
added CUPED and outliers info #248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
44079e1
added CUPED and ouliters info
chris-absmartly 0b23982
fix broken links
chris-absmartly afa7c7f
remove inrrelevnt link
chris-absmartly 3461f13
added link to booking + 0.1 threshold explanation
chris-absmartly 572ce23
added warning that retention and time filters must be within lookback…
chris-absmartly f1079d4
release notes for January 2026
chris-absmartly 58747e8
fix a few typos
chris-absmartly 088f4a7
Added more information about draft/active metrics
chris-absmartly e3055d5
improved CUPED description
chris-absmartly bf3530b
better draft metric description
chris-absmartly fba6a81
typo
chris-absmartly bce470c
fix all the rabbit feedback
chris-absmartly File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| # January 2026 | ||
|
|
||
| ## Overview | ||
| This release deepens our investment in **metrics governance** while also bringing key performance improvements to the | ||
| experiment experience. With this update, managing, discovering, and using metrics becomes faster, more transparent, | ||
| and better integrated into your workflow. | ||
|
|
||
| --- | ||
|
|
||
| ## Metrics | ||
|
|
||
| ### Metric Catalog in Main Menu | ||
| The **metric catalog** is now accessible directly from the main menu, reflecting the central role metrics play across the platform. | ||
|
|
||
| ### Improved Metric Catalog Search | ||
| We've upgraded the metric list to a full **catalog** with enhanced search and filtering, matching the experience already available in experiment creation. | ||
| This makes it easier for: | ||
| - **Metric owners** to manage and maintain their metrics | ||
| - **Experimenters** to find the right metrics for their experiments | ||
|
|
||
| ### Variance reduction with CUPED | ||
| **[CUPED (Controlled-experiment Using Pre-Experiment Data)](/docs/web-console-docs/goals-and-metrics/metrics/variance-reduction-cuped)** | ||
| is a well-known variance reduction technique that makes metrics more sensitive by leveraging pre-experiment data about users. | ||
| It allows you to detect smaller effects with the same sample size, or reach statistical significance faster with fewer users. | ||
|
|
||
| In ABsmartly you can now : | ||
| - Easily enable CUPED for **new metrics**, or by creating a **new version** of an existing metric | ||
| - Choose a lookback period between 1 and 4 weeks | ||
| - Enjoy a shorter time to decision | ||
|
|
||
| ### Metric Duplication | ||
| You can now **duplicate metrics** with a single action, no need to re-enter definitions by hand. | ||
| Ideal when creating a variation of an existing metric. | ||
|
|
||
| ### Draft & Active Status | ||
| Manage your metrics' lifecycle more clearly: | ||
| - All new metrics are created in **draft** by default | ||
| - **Draft** can be edited without restrictions | ||
| - **Draft** metrics cannot be added to experiments | ||
| - Once a metric is **made active** it becomes targetable and can be used in experiments | ||
| - All existing metrics will be made **active** by default so you can keep using all your existing metrics | ||
|
|
||
| This is the first step toward the upcoming **metric approval workflow**, | ||
| which will allow greater control over metric governance in the next release. | ||
|
|
||
| --- | ||
|
|
||
| ## Experiments | ||
|
|
||
| ### Faster Experiment Overview | ||
| We've added **caching** to improve the performance of the experiment overview page, | ||
| especially for experiments with large datasets. | ||
|
|
||
| ### Data Freshness Indicators | ||
| Each experiment now includes: | ||
| - A **data freshness indicator** to help you understand how recent the data is | ||
| - A **force refresh button** so you can manually update results when needed | ||
|
|
||
| ### Graph Improvements | ||
| We've improved overall **graph responsiveness and rendering speed**, giving you a smoother experience when navigating and interpreting results. | ||
| We've also improved the histogram graph so now the buckets match making comparison between variants much easier. | ||
|
|
||
| --- | ||
|
|
||
| ## Questions or Feedback? | ||
| We're always happy to help, so reach out if you have any questions or want to explore how to make the most of these new capabilities. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
122 changes: 122 additions & 0 deletions
122
docs/web-console-docs/goals-and-metrics/metrics/variance-reduction-cuped.mdx
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,122 @@ | ||
| --- | ||
| sidebar_position: 5 | ||
| --- | ||
|
|
||
| # Variance Reduction with CUPED | ||
|
|
||
| ## What is CUPED? | ||
|
|
||
| CUPED (Controlled-experiment Using Pre-Experiment Data) is a variance reduction technique that makes metrics more sensitive by leveraging pre-experiment | ||
| data about users. It allows you to detect smaller effects with the same sample size, or reach statistical significance faster with fewer users. | ||
|
|
||
| In A/B testing, users exhibit natural variability in their behavior before any treatment is applied. | ||
| Some users inherently spend more, engage more, or convert more than others. | ||
| This pre-existing variability creates statistical "noise" that makes it harder to detect the true effect of your changes. | ||
| CUPED reduces this noise by adjusting for users' baseline behavior, effectively isolating the treatment effect. | ||
|
|
||
| ## How CUPED Works | ||
|
|
||
| CUPED uses a covariate—typically the same metric measured during a pre-experiment period—to adjust each user's post-experiment metric value. The adjustment accounts for how each user performed relative to the average before the experiment started. | ||
|
|
||
| The core adjustment formula is: | ||
| ``` | ||
| Adjusted Metric = Raw Metric - θ × (Pre-experiment Metric - Average Pre-experiment Metric) | ||
| ``` | ||
|
|
||
| Where: | ||
| - **Raw Metric**: The user's observed value during the experiment | ||
| - **Pre-experiment Metric**: The same metric measured before the experiment | ||
| - **θ (theta)**: An optimal coefficient estimated from pre-/post-experiment data (often Cov(pre, post) / Var(pre)) | ||
|
|
||
| The adjusted values maintain the same average (mean) as the raw values but have reduced variance, making treatment effects easier to detect. | ||
|
|
||
| ## When CUPED is Most Effective | ||
|
|
||
| CUPED provides the greatest benefit when: | ||
|
|
||
| 1. **High correlation between pre and post metrics** (correlation ≥ 0.3) | ||
| - Revenue metrics typically show correlation of 0.5-0.7 | ||
| - Engagement metrics often show correlation of 0.4-0.6 | ||
| - Conversion metrics may show lower but still useful correlation | ||
|
|
||
| 2. **Sufficient pre-experiment data is available** | ||
| - Minimum: 7-14 days of historical data | ||
| - Recommended: 2-4 weeks for stable baseline estimates | ||
| - The pre-period should reflect normal user behavior | ||
| - In ABsmartly, you can choose between, 1, 2, 3 or 4 weeks with 2 weeks being the default | ||
|
|
||
| 3. **Metrics with high natural variance** | ||
| - Revenue per user (some users spend much more than others) | ||
| - Session counts (power users vs. casual users) | ||
| - Time-based engagement metrics | ||
|
|
||
| ## Practical Examples | ||
|
|
||
| ### Example 1: Revenue Optimization | ||
|
|
||
| You are testing a new checkout flow where the primary metric is `revenue per user`. | ||
|
|
||
| **Without CUPED:** | ||
| - User A: Spent $100/month historically → Spends $110 during test | ||
| - User B: Spent $20/month historically → Spends $25 during test | ||
| - Both show increases, but is it the treatment or natural variance? | ||
|
|
||
| **With CUPED:** | ||
| The algorithm adjusts for their baseline spending patterns. | ||
| If both users increased proportionally beyond their historical baseline, CUPED isolates this treatment effect from their pre-existing spending behavior, | ||
| giving you higher confidence the change drove the increase. | ||
|
|
||
| **Result:** You might detect the effect 30-40% faster or with 30-40% fewer users. | ||
|
|
||
| ### Example 2: Engagement Metrics | ||
|
|
||
| Testing a new feed algorithm where your metric is `sessions per week`. | ||
|
|
||
| **Without CUPED:** | ||
| - High natural variance between power users (10+ sessions/week) and casual users (2 sessions/week) | ||
| - Treatment effects are masked by this user heterogeneity | ||
| - Requires 100,000 users to reach significance | ||
|
|
||
| **With CUPED:** | ||
| - Algorithm adjusts for each user's historical session frequency | ||
| - Can detect the same effect with ~65,000 users | ||
| - Or detect a smaller 2% improvement that would have been undetectable before | ||
|
|
||
| ### Metric Compatibility | ||
|
|
||
| CUPED works best with: | ||
| - **Continuous metrics**: Revenue, time spent, count metrics | ||
|
|
||
| CUPED is less effective for: | ||
| - Metrics without meaningful pre-experiment analogs | ||
| - Completely novel user behaviors introduced by the treatment | ||
| - Metrics where pre- and post-experiment correlation is very low | ||
|
|
||
| ### Statistical Validity | ||
|
|
||
| - **Bias-free**: CUPED does not bias your estimates—it only reduces variance | ||
| - **Conservative**: If pre-experiment data doesn't correlate, CUPED simply doesn't apply adjustment | ||
|
|
||
| ## Benefits of using CUPED | ||
|
|
||
| 1. **Faster decisions**: Reduce time to statistical significance by 30-50% on average | ||
| 2. **Cost efficiency**: Achieve the same statistical power with fewer users | ||
| 3. **Detect smaller effects**: Find wins that would otherwise remain hidden in the noise | ||
| 4. **Typically no downside**: CUPED is conservative; when correlation is weak, it usually offers little benefit but remains unbiased | ||
|
|
||
| ## CUPED and ABsmartly | ||
|
|
||
| When creating a new metric or a new version of an existing metrics, you can enabled CUPED. | ||
| When CUPED is enabled for your metrics in ABsmartly: | ||
|
|
||
| - Pre-experiment data already collected is automatically used | ||
| - The platform calculates optimal θ coefficients for each metric | ||
| - Adjusted metrics are computed alongside raw metrics | ||
| - Statistical significance calculations use the variance-reduced estimates | ||
| - CUPED runs automatically in the background without requiring changes to your experiment setup or tracking implementation | ||
| - When correlation is < 0.1, or when variance is greater than the threshold, ABsmartly uses the raw data | ||
|
|
||
| ## Further Reading | ||
|
|
||
| - Original CUPED paper: [Deng et al., 2013 - "Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data"](https://exp-platform.com/Documents/2013-02-CUPED-ImprovingSensitivityOfControlledExperiments.pdf) | ||
| - CUPED at booking.com: [Simon Jackson, 2018, "How Booking.com increases the power of online experiments with CUPED"](https://booking.ai/how-booking-com-increases-the-power-of-online-experiments-with-cuped-995d186fff1d) | ||
calthejuggler marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.