diff --git a/docs/web-console-docs/experiments/Interpreting-metrics-in-experiment-results.mdx b/docs/web-console-docs/experiments/Interpreting-metrics-in-experiment-results.mdx
index 51ee3108..72fb38a5 100644
--- a/docs/web-console-docs/experiments/Interpreting-metrics-in-experiment-results.mdx
+++ b/docs/web-console-docs/experiments/Interpreting-metrics-in-experiment-results.mdx
@@ -26,7 +26,7 @@ The metrics table provides the following key data points for each variant:
- **Confidence**: The statistical confidence that the observed difference between variants is real, not due to random chance.
-- **Impact**: The difference in metric performance between variants, shown visually and as a percentage with a confidence interval (CI). In this example the observed impact is +1.35% with a confidence interval going from +0.38% to +2.32%.
+- **Impact**: The difference in metric performance between variants, shown visually and as a percentage with a confidence interval (CI). For example, the observed impact could be +1.35% with a confidence interval going from +0.38% to +2.32%.
### Interpreting confidence intervals and results
The confidence interval (CI) and the color-coded indicator help assess the significance of the results:
@@ -66,33 +66,40 @@ Example using the example above
## Group Sequential Testing Metrics
-
+
-### Understanding the Metrics Table
+### Understanding the GST data
-The metrics table provides the following key data points for each variant at the last interim analysis:
+
-- **Mean**: The GST adjusted average performance of the metric for the variant.
+
+In case of a GST experiment, the metric table for the primary metric, shown on the experiment overview has a GST toggle which
+allows you to toggle between the GST data (the one used for decision-making), and the non-GST data which can be used for debugging purposes.
+
+The primary metric table, for GST experiments, provides the following key data points for each variant **at the last interim analysis**:
+
+- **Mean**: The GST-adjusted average performance of the metric for the variant.
- **Observed Mean**: The actual observed average performance of the metric during the experiment.
-- **Impact**: The percentage change in the metric compared to the baseline. This is a GST adjusted value. In this example +1.74% with a confidence interval going from -1.88% to +5.49%.
+- **Impact**: The percentage change in the metric compared to the baseline. This is a GST adjusted value. In this example, +4.84% with a confidence interval going from +2.40% to +8.58%.
- **Z-Score**: A statistical measure that represents how many standard deviations the result is from the null hypothesis (no effect). Positive Z-scores indicate an improvement, while negative Z-scores suggest a decline.
- **P-Value**: The probability that the observed result occurred by chance if the null hypothesis is true. Lower P-values (e.g., below 0.05) indicate statistical significance.
-These data points provide a summary of the ongoing analysis for the selected variant at the last analysis, helping to evaluate its performance relative to the baseline.
+These data points provide a summary of the ongoing analysis for the selected variant at the last interim analysis,
+helping to evaluate its performance relative to the baseline.
---
### Understanding the Group Sequential Graph
-
+
Group Sequential Testing, makes it easy to visually interpret results. The graph displays the evolution of statistical evidence over time, allowing decisions to be made at predefined checkpoints during the experiment. It includes the following elements:
- **X-Axis (Time)**: Represents the progress of the experiment in time, with dates marking each past and future interim analyses.
- **Y-Axis (Standard Deviations, Z-Scale)**: Represents the Z-Score, showing how far the observed result is from the null hypothesis.
- **Z-Score Trajectory (Orange line)**: The path of the observed Z-Score over time. It starts at 0 and moves based on accumulating data.
-- **Efficiency Boundary (Green Region)**: The upper boundary. If the Z-Score trajectory crosses this boundary, the variant shows a statistically significant improvement, and the experiment can be stopped early for success.
-- **Futility Boundary (Pink Region)**: The lower boundary. If the Z-Score trajectory crosses this boundary, the variant is deemed unlikely to show meaningful improvement, and the experiment can be stopped early for futility.
+- **Efficacy Boundary (Green Region)**: The upper boundary. If the Z-Score trajectory crosses this boundary, the variant shows a statistically significant improvement, and the experiment can be stopped early for success.
+- **Futility Boundary (Gray Region)**: The lower boundary. If the Z-Score trajectory crosses this boundary, the variant is deemed unlikely to show meaningful improvement, and the experiment can be stopped early for futility.
- **Fixed Horizon (Vertical Dotted Line)**: Represents the moment in time where the equivalent Fixed Horizon test would have been completed. All interim analyses before that dotted line are opportunities to make an early decision.
---
@@ -101,7 +108,7 @@ Group Sequential Testing, makes it easy to visually interpret results. The graph
- Crossing this boundary means there is enough evidence to conclude the variant performs significantly better than the baseline.
- The experiment can be stopped early, and the variant can be considered successful.
-2. **Futility Boundary (Pink)**:
+2. **Futility Boundary (Gray)**:
- Crossing this boundary indicates the variant is unlikely to show significant improvement.
- In case of a binding futility type (see experiment setup), the experiment is completed (no more interim analysis will happen) and can be stopped as further data collection is unlikely to change the conclusion.
- In case of a non-binding futility type, you can decide to keep the experiment running to the following interim analyses.
diff --git a/static/img/experiment-results/gst-data.png b/static/img/experiment-results/gst-data.png
new file mode 100644
index 00000000..5d56ec13
Binary files /dev/null and b/static/img/experiment-results/gst-data.png differ
diff --git a/static/img/experiment-results/gst-efficacy-boundary-crossed.png b/static/img/experiment-results/gst-efficacy-boundary-crossed.png
new file mode 100644
index 00000000..ebd04a84
Binary files /dev/null and b/static/img/experiment-results/gst-efficacy-boundary-crossed.png differ
diff --git a/static/img/experiment-results/gst-futility-boundary-crossed.png b/static/img/experiment-results/gst-futility-boundary-crossed.png
new file mode 100644
index 00000000..c8526f40
Binary files /dev/null and b/static/img/experiment-results/gst-futility-boundary-crossed.png differ