Separate metric data availability from trial orchestration status by saitcakmak · Pull Request #4958 · facebook/Ax

saitcakmak · 2026-02-27T15:12:32Z

Summary:
This diff decouples metric data availability from trial / orchestration status.

Previously, Client.complete_trial marked trials as FAILED when optimization config metrics were missing. This conflated two concerns: whether a trial
finished running (orchestration) and whether its data is complete (availability).
Marking FAILED discarded useful partial data from model fitting, allowed
re-suggestion of already-evaluated arms, and excluded trials from analysis.

The choice to exclude FAILED trial data was made since we could not rely on the quality of the data collected from FAILED trials. However, by mixing orchestration and metric statuses together, we ended up also throwing away good quality data that just happened to be incomplete (maybe metric fetching failed upstream for unrelated reasons, or one of the metrics failed to compute but the other was fine).

With this change:

We go back to consuming all data that we believe to be reliable.
For transitions, we still require data availability to ensure that we can still fit a model for candidate generation if we transition.

This diff introduces a dedicated metric availability layer:

Core types (ax/core/metric_availability.py)

MetricAvailability enum (NOT_OBSERVED, INCOMPLETE, COMPLETE)
compute_metric_availability() function that checks which opt config metrics
have data for each trial, using experiment.lookup_data() and DataFrame
groupby

Trial completion (ax/api/client.py, ax/core/utils.py)

Trials are always marked COMPLETED regardless of metric completeness
A warning is logged when optimization config metrics are missing

Pending points (ax/core/utils.py)

COMPLETED trials with incomplete data are treated as pending to prevent
re-suggestion of already-evaluated arms

Transition criteria (ax/generation_strategy/generation_node.py)

Default min_trials_observed criterion sets count_only_trials_with_data=True
so COMPLETED trials without data don't count toward transition threshold

Reviewed By: sdaulton

Differential Revision: D93924193

meta-codesync · 2026-02-27T15:12:56Z

@saitcakmak has exported this pull request. If you are a Meta employee, you can view the originating Diff in D93924193.

codecov-commenter · 2026-02-27T15:46:28Z

Codecov Report

❌ Patch coverage is 99.19786% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.84%. Comparing base (8215f32) to head (6257c25).

Files with missing lines	Patch %	Lines
ax/core/tests/test_utils.py	98.75%	2 Missing ⚠️
ax/core/utils.py	98.30%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff            @@
##             main    #4958    +/-   ##
========================================
  Coverage   96.83%   96.84%            
========================================
  Files         596      597     +1     
  Lines       63239    63576   +337     
========================================
+ Hits        61236    61568   +332     
- Misses       2003     2008     +5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…cebook#4958) Summary: This diff decouples metric data availability from trial / orchestration status. Previously, `Client.complete_trial` marked trials as `FAILED` when optimization config metrics were missing. This conflated two concerns: whether a trial finished running (orchestration) and whether its data is complete (availability). Marking `FAILED` discarded useful partial data from model fitting, allowed re-suggestion of already-evaluated arms, and excluded trials from analysis. The choice to exclude `FAILED` trial data was made since we could not rely on the quality of the data collected from `FAILED` trials. However, by mixing orchestration and metric statuses together, we ended up also throwing away good quality data that just happened to be incomplete (maybe metric fetching failed upstream for unrelated reasons, or one of the metrics failed to compute but the other was fine). With this change: - We go back to consuming all data that we believe to be reliable. - For transitions, we still require data availability to ensure that we can still fit a model for candidate generation if we transition. This diff introduces a dedicated metric availability layer: **Core types** (`ax/core/metric_availability.py`) - `MetricAvailability` enum (`NOT_OBSERVED`, `INCOMPLETE`, `COMPLETE`) - `compute_metric_availability()` function that checks which opt config metrics have data for each trial, using `experiment.lookup_data()` and DataFrame groupby **Trial completion** (`ax/api/client.py`, `ax/core/utils.py`) - Trials are always marked COMPLETED regardless of metric completeness - A warning is logged when optimization config metrics are missing **Pending points** (`ax/core/utils.py`) - COMPLETED trials with incomplete data are treated as pending to prevent re-suggestion of already-evaluated arms **Transition criteria** (`ax/generation_strategy/generation_node.py`) - Default `min_trials_observed` criterion sets `count_only_trials_with_data=True` so COMPLETED trials without data don't count toward transition threshold Reviewed By: sdaulton Differential Revision: D93924193

…cebook#4958) Summary: Pull Request resolved: facebook#4958 This diff decouples metric data availability from trial / orchestration status. Previously, `Client.complete_trial` marked trials as `FAILED` when optimization config metrics were missing. This conflated two concerns: whether a trial finished running (orchestration) and whether its data is complete (availability). Marking `FAILED` discarded useful partial data from model fitting, allowed re-suggestion of already-evaluated arms, and excluded trials from analysis. The choice to exclude `FAILED` trial data was made since we could not rely on the quality of the data collected from `FAILED` trials. However, by mixing orchestration and metric statuses together, we ended up also throwing away good quality data that just happened to be incomplete (maybe metric fetching failed upstream for unrelated reasons, or one of the metrics failed to compute but the other was fine). With this change: - We go back to consuming all data that we believe to be reliable. - For transitions, we still require data availability to ensure that we can still fit a model for candidate generation if we transition. This diff introduces a dedicated metric availability layer: **Core types** (`ax/core/metric_availability.py`) - `MetricAvailability` enum (`NOT_OBSERVED`, `INCOMPLETE`, `COMPLETE`) - `compute_metric_availability()` function that checks which opt config metrics have data for each trial, using `experiment.lookup_data()` and DataFrame groupby **Trial completion** (`ax/api/client.py`, `ax/core/utils.py`) - Trials are always marked COMPLETED regardless of metric completeness - A warning is logged when optimization config metrics are missing **Pending points** (`ax/core/utils.py`) - COMPLETED trials with incomplete data are treated as pending to prevent re-suggestion of already-evaluated arms **Transition criteria** (`ax/generation_strategy/generation_node.py`) - Default `min_trials_observed` criterion sets `count_only_trials_with_data=True` so COMPLETED trials without data don't count toward transition threshold Reviewed By: sdaulton Differential Revision: D93924193

meta-cla bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Feb 27, 2026

meta-codesync bot added fb-exported meta-exported labels Feb 27, 2026

saitcakmak force-pushed the export-D93924193 branch from 6257c25 to 9dbb2e0 Compare February 27, 2026 17:08

saitcakmak force-pushed the export-D93924193 branch from 9dbb2e0 to 240d4fd Compare February 27, 2026 17:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate metric data availability from trial orchestration status#4958

Separate metric data availability from trial orchestration status#4958
saitcakmak wants to merge 1 commit intofacebook:mainfrom
saitcakmak:export-D93924193

saitcakmak commented Feb 27, 2026

Uh oh!

meta-codesync bot commented Feb 27, 2026

Uh oh!

codecov-commenter commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

saitcakmak commented Feb 27, 2026

Uh oh!

meta-codesync bot commented Feb 27, 2026

Uh oh!

codecov-commenter commented Feb 27, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants