Separate metric data availability from trial orchestration status#4958
Open
saitcakmak wants to merge 1 commit intofacebook:mainfrom
Open
Separate metric data availability from trial orchestration status#4958saitcakmak wants to merge 1 commit intofacebook:mainfrom
saitcakmak wants to merge 1 commit intofacebook:mainfrom
Conversation
|
@saitcakmak has exported this pull request. If you are a Meta employee, you can view the originating Diff in D93924193. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4958 +/- ##
========================================
Coverage 96.83% 96.84%
========================================
Files 596 597 +1
Lines 63239 63576 +337
========================================
+ Hits 61236 61568 +332
- Misses 2003 2008 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
saitcakmak
added a commit
to saitcakmak/Ax
that referenced
this pull request
Feb 27, 2026
…cebook#4958) Summary: This diff decouples metric data availability from trial / orchestration status. Previously, `Client.complete_trial` marked trials as `FAILED` when optimization config metrics were missing. This conflated two concerns: whether a trial finished running (orchestration) and whether its data is complete (availability). Marking `FAILED` discarded useful partial data from model fitting, allowed re-suggestion of already-evaluated arms, and excluded trials from analysis. The choice to exclude `FAILED` trial data was made since we could not rely on the quality of the data collected from `FAILED` trials. However, by mixing orchestration and metric statuses together, we ended up also throwing away good quality data that just happened to be incomplete (maybe metric fetching failed upstream for unrelated reasons, or one of the metrics failed to compute but the other was fine). With this change: - We go back to consuming all data that we believe to be reliable. - For transitions, we still require data availability to ensure that we can still fit a model for candidate generation if we transition. This diff introduces a dedicated metric availability layer: **Core types** (`ax/core/metric_availability.py`) - `MetricAvailability` enum (`NOT_OBSERVED`, `INCOMPLETE`, `COMPLETE`) - `compute_metric_availability()` function that checks which opt config metrics have data for each trial, using `experiment.lookup_data()` and DataFrame groupby **Trial completion** (`ax/api/client.py`, `ax/core/utils.py`) - Trials are always marked COMPLETED regardless of metric completeness - A warning is logged when optimization config metrics are missing **Pending points** (`ax/core/utils.py`) - COMPLETED trials with incomplete data are treated as pending to prevent re-suggestion of already-evaluated arms **Transition criteria** (`ax/generation_strategy/generation_node.py`) - Default `min_trials_observed` criterion sets `count_only_trials_with_data=True` so COMPLETED trials without data don't count toward transition threshold Reviewed By: sdaulton Differential Revision: D93924193
6257c25 to
9dbb2e0
Compare
…cebook#4958) Summary: Pull Request resolved: facebook#4958 This diff decouples metric data availability from trial / orchestration status. Previously, `Client.complete_trial` marked trials as `FAILED` when optimization config metrics were missing. This conflated two concerns: whether a trial finished running (orchestration) and whether its data is complete (availability). Marking `FAILED` discarded useful partial data from model fitting, allowed re-suggestion of already-evaluated arms, and excluded trials from analysis. The choice to exclude `FAILED` trial data was made since we could not rely on the quality of the data collected from `FAILED` trials. However, by mixing orchestration and metric statuses together, we ended up also throwing away good quality data that just happened to be incomplete (maybe metric fetching failed upstream for unrelated reasons, or one of the metrics failed to compute but the other was fine). With this change: - We go back to consuming all data that we believe to be reliable. - For transitions, we still require data availability to ensure that we can still fit a model for candidate generation if we transition. This diff introduces a dedicated metric availability layer: **Core types** (`ax/core/metric_availability.py`) - `MetricAvailability` enum (`NOT_OBSERVED`, `INCOMPLETE`, `COMPLETE`) - `compute_metric_availability()` function that checks which opt config metrics have data for each trial, using `experiment.lookup_data()` and DataFrame groupby **Trial completion** (`ax/api/client.py`, `ax/core/utils.py`) - Trials are always marked COMPLETED regardless of metric completeness - A warning is logged when optimization config metrics are missing **Pending points** (`ax/core/utils.py`) - COMPLETED trials with incomplete data are treated as pending to prevent re-suggestion of already-evaluated arms **Transition criteria** (`ax/generation_strategy/generation_node.py`) - Default `min_trials_observed` criterion sets `count_only_trials_with_data=True` so COMPLETED trials without data don't count toward transition threshold Reviewed By: sdaulton Differential Revision: D93924193
9dbb2e0 to
240d4fd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
This diff decouples metric data availability from trial / orchestration status.
Previously,
Client.complete_trialmarked trials asFAILEDwhen optimization config metrics were missing. This conflated two concerns: whether a trialfinished running (orchestration) and whether its data is complete (availability).
Marking
FAILEDdiscarded useful partial data from model fitting, allowedre-suggestion of already-evaluated arms, and excluded trials from analysis.
The choice to exclude
FAILEDtrial data was made since we could not rely on the quality of the data collected fromFAILEDtrials. However, by mixing orchestration and metric statuses together, we ended up also throwing away good quality data that just happened to be incomplete (maybe metric fetching failed upstream for unrelated reasons, or one of the metrics failed to compute but the other was fine).With this change:
This diff introduces a dedicated metric availability layer:
Core types (
ax/core/metric_availability.py)MetricAvailabilityenum (NOT_OBSERVED,INCOMPLETE,COMPLETE)compute_metric_availability()function that checks which opt config metricshave data for each trial, using
experiment.lookup_data()and DataFramegroupby
Trial completion (
ax/api/client.py,ax/core/utils.py)Pending points (
ax/core/utils.py)re-suggestion of already-evaluated arms
Transition criteria (
ax/generation_strategy/generation_node.py)min_trials_observedcriterion setscount_only_trials_with_data=Trueso COMPLETED trials without data don't count toward transition threshold
Reviewed By: sdaulton
Differential Revision: D93924193