Skip to content

Separate metric data availability from trial orchestration status#4958

Open
saitcakmak wants to merge 1 commit intofacebook:mainfrom
saitcakmak:export-D93924193
Open

Separate metric data availability from trial orchestration status#4958
saitcakmak wants to merge 1 commit intofacebook:mainfrom
saitcakmak:export-D93924193

Conversation

@saitcakmak
Copy link
Contributor

Summary:
This diff decouples metric data availability from trial / orchestration status.

Previously, Client.complete_trial marked trials as FAILED when optimization config metrics were missing. This conflated two concerns: whether a trial
finished running (orchestration) and whether its data is complete (availability).
Marking FAILED discarded useful partial data from model fitting, allowed
re-suggestion of already-evaluated arms, and excluded trials from analysis.

The choice to exclude FAILED trial data was made since we could not rely on the quality of the data collected from FAILED trials. However, by mixing orchestration and metric statuses together, we ended up also throwing away good quality data that just happened to be incomplete (maybe metric fetching failed upstream for unrelated reasons, or one of the metrics failed to compute but the other was fine).

With this change:

  • We go back to consuming all data that we believe to be reliable.
  • For transitions, we still require data availability to ensure that we can still fit a model for candidate generation if we transition.

This diff introduces a dedicated metric availability layer:

Core types (ax/core/metric_availability.py)

  • MetricAvailability enum (NOT_OBSERVED, INCOMPLETE, COMPLETE)
  • compute_metric_availability() function that checks which opt config metrics
    have data for each trial, using experiment.lookup_data() and DataFrame
    groupby

Trial completion (ax/api/client.py, ax/core/utils.py)

  • Trials are always marked COMPLETED regardless of metric completeness
  • A warning is logged when optimization config metrics are missing

Pending points (ax/core/utils.py)

  • COMPLETED trials with incomplete data are treated as pending to prevent
    re-suggestion of already-evaluated arms

Transition criteria (ax/generation_strategy/generation_node.py)

  • Default min_trials_observed criterion sets count_only_trials_with_data=True
    so COMPLETED trials without data don't count toward transition threshold

Reviewed By: sdaulton

Differential Revision: D93924193

@meta-cla meta-cla bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Feb 27, 2026
@meta-codesync
Copy link

meta-codesync bot commented Feb 27, 2026

@saitcakmak has exported this pull request. If you are a Meta employee, you can view the originating Diff in D93924193.

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 99.19786% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.84%. Comparing base (8215f32) to head (6257c25).

Files with missing lines Patch % Lines
ax/core/tests/test_utils.py 98.75% 2 Missing ⚠️
ax/core/utils.py 98.30% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main    #4958    +/-   ##
========================================
  Coverage   96.83%   96.84%            
========================================
  Files         596      597     +1     
  Lines       63239    63576   +337     
========================================
+ Hits        61236    61568   +332     
- Misses       2003     2008     +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

saitcakmak added a commit to saitcakmak/Ax that referenced this pull request Feb 27, 2026
…cebook#4958)

Summary:

This diff decouples metric data availability from trial / orchestration status.

Previously, `Client.complete_trial` marked trials as `FAILED` when optimization config metrics were missing. This conflated two concerns: whether a trial
finished running (orchestration) and whether its data is complete (availability).
Marking `FAILED` discarded useful partial data from model fitting, allowed
re-suggestion of already-evaluated arms, and excluded trials from analysis.

The choice to exclude `FAILED` trial data was made since we could not rely on the quality of the data collected from `FAILED` trials. However, by mixing orchestration and metric statuses together, we ended up also throwing away good quality data that just happened to be incomplete (maybe metric fetching failed upstream for unrelated reasons, or one of the metrics failed to compute but the other was fine). 

With this change:
- We go back to consuming all data that we believe to be reliable.
- For transitions, we still require data availability to ensure that we can still fit a model for candidate generation if we transition.



This diff introduces a dedicated metric availability layer:

**Core types** (`ax/core/metric_availability.py`)
- `MetricAvailability` enum (`NOT_OBSERVED`, `INCOMPLETE`, `COMPLETE`)
- `compute_metric_availability()` function that checks which opt config metrics
  have data for each trial, using `experiment.lookup_data()` and DataFrame
  groupby

**Trial completion** (`ax/api/client.py`, `ax/core/utils.py`)
- Trials are always marked COMPLETED regardless of metric completeness
- A warning is logged when optimization config metrics are missing

**Pending points** (`ax/core/utils.py`)
- COMPLETED trials with incomplete data are treated as pending to prevent
  re-suggestion of already-evaluated arms

**Transition criteria** (`ax/generation_strategy/generation_node.py`)
- Default `min_trials_observed` criterion sets `count_only_trials_with_data=True`
  so COMPLETED trials without data don't count toward transition threshold

Reviewed By: sdaulton

Differential Revision: D93924193
…cebook#4958)

Summary:
Pull Request resolved: facebook#4958

This diff decouples metric data availability from trial / orchestration status.

Previously, `Client.complete_trial` marked trials as `FAILED` when optimization config metrics were missing. This conflated two concerns: whether a trial
finished running (orchestration) and whether its data is complete (availability).
Marking `FAILED` discarded useful partial data from model fitting, allowed
re-suggestion of already-evaluated arms, and excluded trials from analysis.

The choice to exclude `FAILED` trial data was made since we could not rely on the quality of the data collected from `FAILED` trials. However, by mixing orchestration and metric statuses together, we ended up also throwing away good quality data that just happened to be incomplete (maybe metric fetching failed upstream for unrelated reasons, or one of the metrics failed to compute but the other was fine).

With this change:
- We go back to consuming all data that we believe to be reliable.
- For transitions, we still require data availability to ensure that we can still fit a model for candidate generation if we transition.

This diff introduces a dedicated metric availability layer:

**Core types** (`ax/core/metric_availability.py`)
- `MetricAvailability` enum (`NOT_OBSERVED`, `INCOMPLETE`, `COMPLETE`)
- `compute_metric_availability()` function that checks which opt config metrics
  have data for each trial, using `experiment.lookup_data()` and DataFrame
  groupby

**Trial completion** (`ax/api/client.py`, `ax/core/utils.py`)
- Trials are always marked COMPLETED regardless of metric completeness
- A warning is logged when optimization config metrics are missing

**Pending points** (`ax/core/utils.py`)
- COMPLETED trials with incomplete data are treated as pending to prevent
  re-suggestion of already-evaluated arms

**Transition criteria** (`ax/generation_strategy/generation_node.py`)
- Default `min_trials_observed` criterion sets `count_only_trials_with_data=True`
  so COMPLETED trials without data don't count toward transition threshold

Reviewed By: sdaulton

Differential Revision: D93924193
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed Do not delete this pull request or issue due to inactivity. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants