Changes to model normalization to allow them to align over time to the data#3408
Changes to model normalization to allow them to align over time to the data#3408springfall2008 merged 4 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request implements exponential moving average (EMA) updates for normalization parameters in the ML load predictor to allow the model to gradually adapt to distribution drift over time (e.g., seasonal changes, new appliances, tariff changes). The feature normalization statistics (mean/std) are blended with new data during each fine-tuning cycle using a configurable alpha parameter (default 0.1 for slow drift tracking).
Changes:
- Added EMA blending logic to
_normalize_featuresmethod for tracking feature distribution drift during fine-tuning - Introduced
norm_ema_alphaparameter to control drift adaptation rate (0=frozen, 0.1=slow drift) - Added normalization statistics logging to track drift in feature groups over time
- Updated documentation to explain EMA drift tracking, normalization logging, and model persistence behavior
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| docs/load-ml.md | Documents EMA normalization drift tracking feature, explains alpha parameter, describes normalization stats logging, and clarifies model persistence behavior |
| apps/predbat/load_predictor.py | Implements EMA blending for feature normalization, adds drift tracking logs, introduces norm_ema_alpha parameter, and refactors normalization logic to support gradual adaptation |
apps/predbat/load_predictor.py
Outdated
| # Apply same min-std clamping to new stats before blending | ||
| n_features = len(new_std) | ||
| min_std = np.ones(n_features) * 1e-8 | ||
| if n_features == TOTAL_FEATURES: | ||
| min_std[0:LOOKBACK_STEPS] = 0.01 | ||
| min_std[LOOKBACK_STEPS : 2 * LOOKBACK_STEPS] = 0.01 | ||
| min_std[2 * LOOKBACK_STEPS : 3 * LOOKBACK_STEPS] = 0.5 | ||
| min_std[3 * LOOKBACK_STEPS : 4 * LOOKBACK_STEPS] = 1.0 | ||
| min_std[4 * LOOKBACK_STEPS : 5 * LOOKBACK_STEPS] = 1.0 | ||
| min_std[5 * LOOKBACK_STEPS :] = 0.01 | ||
| new_std = np.maximum(new_std, min_std) |
There was a problem hiding this comment.
The min-std clamping logic is duplicated between the initial fit path (lines 671-686) and the EMA update path (lines 697-707). This duplication makes the code harder to maintain - if these values need to change, they must be updated in multiple places.
Consider extracting this logic into a helper method like _get_min_std_array(n_features) that returns the appropriate min-std array for the given number of features.
| elif ema_alpha > 0 and self.feature_mean is not None and self.feature_std is not None: | ||
| # EMA update: blend new statistics with existing to track distribution drift | ||
| new_mean = np.mean(X, axis=0) | ||
| new_std = np.std(X, axis=0) | ||
|
|
||
| # Apply same min-std clamping to new stats before blending | ||
| n_features = len(new_std) | ||
| min_std = np.ones(n_features) * 1e-8 | ||
| if n_features == TOTAL_FEATURES: | ||
| min_std[0:LOOKBACK_STEPS] = 0.01 | ||
| min_std[LOOKBACK_STEPS : 2 * LOOKBACK_STEPS] = 0.01 | ||
| min_std[2 * LOOKBACK_STEPS : 3 * LOOKBACK_STEPS] = 0.5 | ||
| min_std[3 * LOOKBACK_STEPS : 4 * LOOKBACK_STEPS] = 1.0 | ||
| min_std[4 * LOOKBACK_STEPS : 5 * LOOKBACK_STEPS] = 1.0 | ||
| min_std[5 * LOOKBACK_STEPS :] = 0.01 | ||
| new_std = np.maximum(new_std, min_std) | ||
|
|
||
| # Blend: small alpha = slow drift tracking, large alpha = fast adaptation | ||
| self.feature_mean = ema_alpha * new_mean + (1 - ema_alpha) * self.feature_mean | ||
| self.feature_std = ema_alpha * new_std + (1 - ema_alpha) * self.feature_std | ||
| self._log_normalization_stats(label="ema-update alpha={}".format(ema_alpha)) |
There was a problem hiding this comment.
The new EMA normalization drift tracking functionality (norm_ema_alpha parameter and EMA update logic) is not covered by tests. The existing test_load_ml.py has a _test_normalization function but it only tests basic z-score normalization, not the EMA blending behavior during fine-tuning.
Consider adding a test that:
- Trains a model on initial data
- Captures the initial feature_mean and feature_std
- Fine-tunes with data from a shifted distribution
- Verifies that feature_mean and feature_std have moved toward the new distribution proportional to the EMA alpha value
- Confirms predictions still work correctly with the updated normalization
| time_decay_days: Time constant for sample weighting | ||
| patience: Early stopping patience | ||
| validation_holdout_hours: Hours of most recent data to hold out for validation | ||
| norm_ema_alpha: EMA alpha for normalization drift tracking during fine-tuning (0=frozen, 0.1=slow drift) |
There was a problem hiding this comment.
The norm_ema_alpha parameter is not validated. If a user passes a negative value or a value greater than 1.0, the EMA blending formula could produce unexpected results (e.g., negative variance, or extrapolation beyond the new statistics).
Consider adding parameter validation at the start of the train method to ensure 0.0 <= norm_ema_alpha <= 1.0, raising a ValueError if the constraint is violated.
| - Preserves learned patterns while adapting to new ones | ||
| - Same regularization techniques applied as initial training | ||
| - Each fine-tune cycle blends the current data's feature statistics (mean/std) with the stored normalization parameters via an exponential moving average (alpha=0.1). | ||
| This lets the model slowly track long-term shifts in feature distributions (e.g. seasonal load changes, new tariff rates) without sudden jumps that could destabilise existing weights. |
There was a problem hiding this comment.
The word "destabilise" uses British spelling, but the codebase generally uses American English spelling (e.g., "optimize" not "optimise" in the same file). For consistency, change "destabilise" to "destabilize".
| This lets the model slowly track long-term shifts in feature distributions (e.g. seasonal load changes, new tariff rates) without sudden jumps that could destabilise existing weights. | |
| This lets the model slowly track long-term shifts in feature distributions (e.g. seasonal load changes, new tariff rates) without sudden jumps that could destabilize existing weights. |
| y_train_norm = self._normalize_targets(y_train, fit=False) | ||
| self.log("ML Predictor: Applied EMA normalization update (alpha={}) to track feature drift".format(norm_ema_alpha)) |
There was a problem hiding this comment.
Target normalization parameters are not updated during fine-tuning, only feature normalization parameters receive EMA updates. This inconsistency means that if the target distribution (load values) drifts over time (e.g., seasonal changes, new appliances), the target normalization will become stale while feature normalization adapts. This could lead to prediction issues since the model's output layer expects targets normalized with the original statistics.
Consider also applying EMA updates to target_mean and target_std during fine-tuning to maintain consistency with the feature normalization drift tracking approach.
| y_train_norm = self._normalize_targets(y_train, fit=False) | |
| self.log("ML Predictor: Applied EMA normalization update (alpha={}) to track feature drift".format(norm_ema_alpha)) | |
| y_train_norm = self._normalize_targets(y_train, fit=False, ema_alpha=norm_ema_alpha) | |
| self.log("ML Predictor: Applied EMA normalization update (alpha={}) to track feature and target drift".format(norm_ema_alpha)) |
| grp_std = float(np.mean(self.feature_std[start:end])) | ||
| parts.append("{}(mean={:.4f} std={:.4f})".format(name, grp_mean, grp_std)) | ||
|
|
||
| self.log("ML Predictor: Normalization stats [{}] target(mean={:.4f} std={:.4f}) {}".format(label, self.target_mean if self.target_mean is not None else 0, self.target_std if self.target_std is not None else 0, " ".join(parts))) |
There was a problem hiding this comment.
The logging statement includes target mean/std in the normalization stats output, but these values won't actually change during EMA updates since only feature normalization receives EMA updates. While this doesn't cause incorrect behavior, it could be misleading when monitoring drift - users might expect to see target stats change during fine-tuning cycles when they actually remain frozen.
Consider either: (1) updating the log format to clarify that target stats are from initial fit only, or (2) implementing EMA updates for target normalization as well for consistency.
| self.log("ML Predictor: Normalization stats [{}] target(mean={:.4f} std={:.4f}) {}".format(label, self.target_mean if self.target_mean is not None else 0, self.target_std if self.target_std is not None else 0, " ".join(parts))) | |
| self.log("ML Predictor: Normalization stats [{}] target_init(mean={:.4f} std={:.4f}, initial-fit only) {}".format(label, self.target_mean if self.target_mean is not None else 0, self.target_std if self.target_std is not None else 0, " ".join(parts))) |
| - **Network weights and biases**: All 4 hidden layers plus output layer | ||
| - **Optimizer state**: Adam momentum terms for continuing fine-tuning | ||
| - **Normalization parameters**: Feature and target mean/standard deviation | ||
| - **Normalization parameters**: Feature and target mean/standard deviation (updated via EMA each fine-tune cycle to track distribution drift) |
There was a problem hiding this comment.
The documentation states "normalization parameters" are updated via EMA, but the implementation only applies EMA updates to feature normalization (feature_mean, feature_std), not target normalization (target_mean, target_std). The documentation should either clarify that only feature normalization receives EMA updates, or the code should be updated to also apply EMA to target normalization for consistency.
Suggest updating this line to: "Feature normalization parameters (mean/standard deviation) are updated via EMA each fine-tune cycle to track distribution drift"
| - **Training metadata**: Epochs trained, timestamp, model version, architecture details | ||
|
|
||
| The model is automatically loaded on Predbat restart, allowing predictions to continue immediately without retraining. | ||
| The model is automatically loaded on Predbat restart, allowing predictions to continue immediately without retraining. The EMA-updated normalization parameters are saved and restored with the model, so drift tracking is preserved across restarts. |
There was a problem hiding this comment.
Similar to the issue at line 415, this statement is imprecise - only feature normalization parameters receive EMA updates, not target normalization parameters. For accuracy, consider updating to: "The EMA-updated feature normalization parameters are saved and restored with the model, so feature drift tracking is preserved across restarts. Target normalization parameters remain fixed from initial training."
No description provided.