[BUG ] fix performance warning in AddMissingIndicator #887

mo1998 · 2026-02-01T12:51:15Z

Summary

This PR resolves the Pandas PerformanceWarning: DataFrame is highly fragmented that was triggered when using the AddMissingIndicator transformer.
The warning occurred because missing-indicator columns were previously added to the DataFrame one at a time via direct assignment, which is inefficient when handling many columns.

What Changed

The transform method in feature_engine/imputation/missing_indicator.py has been refactored to improve performance and avoid DataFrame fragmentation:

All missing indicators are now computed in a single step.
A dedicated DataFrame is created to hold the indicator columns.
The original DataFrame and the indicators DataFrame are merged using pd.concat in one operation, instead of repeated column assignments.

Impact

Eliminates the Pandas fragmentation warning.
Improves performance and memory efficiency when generating many missing-indicator columns.
No changes to the public API or transformer behavior.

Related Issue

Fixes Pandas fragmentation warning with AddMissingIndicator #886

Verification

Reproduction: Confirmed that the warning no longer appears when running a script that previously triggered it.
Regression Testing: All existing tests in tests/test_imputation/ pass successfully.

- Fix UnboundLocalError in _variable_type_checks.py by initializing is_cat/is_dt - Add robust dtype checking using both is_object_dtype and is_string_dtype - Update find_variables.py with same robust logic for consistency - Fix warning count assertions in encoder tests (Pandas 3 adds extra deprecation warnings) - Fix floating point precision assertion in recursive feature elimination test - Apply ruff formatting and fix linting errors - All 1900 tests passing

…imilarityEncoder

mo1998 · 2026-02-02T07:06:32Z

I merged #885 Into codebase to fix the issue with pandas 3.0

codecov · 2026-02-02T07:12:37Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.20%. Comparing base (8282fd4) to head (6c41b96).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #887      +/-   ##
==========================================
+ Coverage   98.11%   98.20%   +0.09%     
==========================================
  Files         113      113              
  Lines        4829     4857      +28     
  Branches      768      775       +7     
==========================================
+ Hits         4738     4770      +32     
+ Misses         56       55       -1     
+ Partials       35       32       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

solegalli · 2026-02-03T02:31:00Z

Hi @mo1998

Thank you very much for the fix to the fragmentation warning.

The branch that fixes pandas warning (#885) has many files that shouldn't be changed. That PR is still WIP (work in progress).

Could you please, momentarily, just commit the changes that fix the fragmentation warning? Ideally, we'd also like to add a test that would trigger the warning with the old version of the code, and does not trigger it with the new version.

I asked @jose-cano for an example. I believe the warning is triggered when trying to add too many variables.

After we merge #885, we can then rebase main to resolve the pandas issues.

Thanks a lot!

This reverts commit 6c41b96, reversing changes made to 0fb27cb.

… variables

mo1998 · 2026-02-03T14:35:41Z

Hi @solegalli,

I’ve updated the PR to focus strictly on the fragmentation warning fix (Issue #886).

Summary of changes:

Fragmentation fix: Refactored AddMissingIndicator.transform() to use pd.concat when adding indicator variables. This replaces the iterative column assignment that was triggering the PerformanceWarning.
New test case: Added test_no_performance_warning_with_many_variables in
tests/test_imputation/test_missing_indicator.py.
- The test uses a DataFrame with 101 columns (the threshold at which the warning is raised) and verifies that transform runs without triggering pd.errors.PerformanceWarning.
Cleanup: Reverted all unrelated changes from the WIP Pandas 3.0 compatibility work. Only the two files directly related to the fragmentation fix are now modified.

The PR should now be ready for review.
Thanks a lot!

solegalli · 2026-02-03T15:04:48Z

tests/test_imputation/test_missing_indicator.py

+    # Test for issue #886: PerformanceWarning due to fragmentation
+    import numpy as np
+    import pandas as pd
+    import warnings


How about this:

import warnings import numpy as np import pandas as pd def test_no_performance_warning_with_many_variables(): n_cols = 101 df = pd.DataFrame( np.random.randn(10, n_cols), columns=[f"col_{i}" for i in range(n_cols)], ) # Introduce missing values df.iloc[0, :] = np.nan ami = AddMissingIndicator(missing_only=False) ami.fit(df) with warnings.catch_warnings(record=True) as captured: warnings.simplefilter("always") ami.transform(df) assert not any( issubclass(w.category, pd.errors.PerformanceWarning) for w in captured ), "PerformanceWarning was raised during transform"

solegalli · 2026-02-03T15:11:24Z

feature_engine/imputation/missing_indicator.py


-        indicator_names = [f"{feature}_na" for feature in self.variables_]
-        X[indicator_names] = X[self.variables_].isna().astype(int)
+        X_indicators = X[self.variables_].isna().astype(int)


How about replacing the 2 lines with this?

X_indicators = (
X[self.variables_]
.isna()
.astype("int8")
.add_suffix("_na")
)

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.add_suffix.html

solegalli and others added 18 commits January 27, 2026 21:35

update dt functions

d090224

expand tests

6ff27aa

expand tests

9d44303

update fpr new pandas behaviour

de4d663

fix: Remove whitespace before colon in slice notation (flake8 E203)

e0c3292

feat: finalize Pandas 3 compatibility fixes and test updates

ccbfa05

style: fix flake8 line length and linting issues

fd43124

style: fix remaining flake8 C416 issue

8367d4a

Fix Pandas 3 regressions in check_y, _check_contains_inf, and StringS…

3225500

…imilarityEncoder

Fix E501 line too long in dataframe_checks.py

bde0b9b

Fix StringSimilarityEncoder NaN issues and fragile test assertions

dedf500

fix: Pandas 3 stability - mock datasets and fix FutureWarnings

765e102

style: fix flake8 linting errors E501, E302, E305, SIM102

28894c5

test: improve patch coverage for Pandas 3 stability fixes

08821a6

style: fix E501 line too long in similarity encoder tests

972a4b7

fix: correct missing indicator creation in AddMissingIndicator class

0fb27cb

Merge branch 'pr-885-pandas3'

6c41b96

mo1998 added 2 commits February 3, 2026 15:37

Revert "Merge branch 'pr-885-pandas3'"

4ef16d0

This reverts commit 6c41b96, reversing changes made to 0fb27cb.

test: add performance warning check for AddMissingIndicator with many…

2171550

… variables

solegalli reviewed Feb 3, 2026

View reviewed changes

solegalli changed the title ~~fix: correct missing indicator creation in AddMissingIndicator class~~ [BUG ] fix performance warning in AddMissingIndicator Feb 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG ] fix performance warning in AddMissingIndicator #887

[BUG ] fix performance warning in AddMissingIndicator #887

mo1998 commented Feb 1, 2026

Uh oh!

mo1998 commented Feb 2, 2026

Uh oh!

codecov bot commented Feb 2, 2026

Uh oh!

solegalli commented Feb 3, 2026

Uh oh!

mo1998 commented Feb 3, 2026

Uh oh!

solegalli Feb 3, 2026

Uh oh!

solegalli Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[BUG ] fix performance warning in AddMissingIndicator #887

Are you sure you want to change the base?

[BUG ] fix performance warning in AddMissingIndicator #887

Conversation

mo1998 commented Feb 1, 2026

Summary

What Changed

Impact

Related Issue

Verification

Uh oh!

mo1998 commented Feb 2, 2026

Uh oh!

codecov bot commented Feb 2, 2026

Codecov Report

Uh oh!

solegalli commented Feb 3, 2026

Uh oh!

mo1998 commented Feb 3, 2026

Uh oh!

solegalli Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

solegalli Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants