Revert internals destruction and add test for internals recreation by XuehaiPan · Pull Request #5972 · pybind/pybind11

XuehaiPan · 2026-01-22T17:29:14Z

Description

This PR fixes a segfault that can occur during Python interpreter shutdown when tp_traverse/tp_clear callbacks call py::cast after pybind11 internals have been destroyed.

Fixes #5958 (comment)

Background

PR #5958 introduced internals_shutdown() that resets internals during interpreter finalization. However, because GC order is not guaranteed during shutdown, tp_traverse/tp_clear can run after internals have been reset. When these callbacks call py::cast, it recreates an empty internals struct, fails type lookup, and throws pybind11::cast_error, terminating the process.

Solution

This PR takes the "leak internals" approach for correctness:

Revert internals destruction: The explicit destructors from Destruct internals during interpreter finalization #5958 are removed. internals and local_internals revert to the previous "leak on shutdown" behavior. The leak only occurs at process exit, so it's benign in practice.
Re-entrancy detection: A new create_pp_content_once() method tracks which interpreters have already created internals. If code attempts to recreate internals after destruction (e.g., late shutdown code calling py::cast), it triggers a hard failure with a diagnostic message rather than silently creating empty internals.
New capsule helpers: get_internals_capsule() and get_local_internals_capsule() allow tests (and potentially user code) to verify capsule lifetime.

Changes

include/pybind11/detail/internals.h:

internals::~internals() -> = default (no explicit destruction)
local_internals no longer stores istate or defines a destructor
Internals capsule passes dtor=nullptr to intentionally leak at shutdown
Added PYBIND11_DTOR_USE_DELETE sentinel for atomic_get_or_create_in_state_dict() to distinguish "use delete" from "leak"
Added create_pp_content_once() with re-entrancy detection
Added get_internals_capsule(), get_local_internals_capsule(), and get_local_internals_key() helpers

Tests:

Refactored OwnsPythonObjects -> ContainerOwnsPythonObjects with a vector-based container
Added add_gc_checkers_with_weakrefs() to verify capsule lifetime during GC
Added test_py_cast_useable_on_shutdown that exercises the shutdown GC path in a subprocess
Moved check_script_success_in_subprocess() to tests/env.py for reuse

Documentation:

Updated docs/advanced/classes.rst example to match the new test code

Notes

No internals version bump: PYBIND11_INTERNALS_VERSION remains 11 for ABI compatibility with v3.0.1
The re-entrancy guard calls pybind11_fail() which terminates the process. This is intentional to surface problematic code paths rather than silently failing.

Suggested changelog entry:

Revert internals destruction from Destruct internals during interpreter finalization #5958 to fix segfault when py::cast is called during interpreter shutdown via tp_traverse/tp_clear callbacks.

📚 Documentation preview 📚: https://pybind11--5972.org.readthedocs.build/

rwgk · 2026-01-22T19:28:50Z

@XuehaiPan Is your plan to work on adding a new unit test here after you have the fix? — Short-term the fix is most important, long-term the test will be more important.

I could try to use some LLM to generate a new test, based on what you provided under 5958. Do you think that'd be useful?

b-pass · 2026-01-22T23:57:34Z

It might be better to mostly revert #5958 if the behavior it has is not what we want. The problem described is expected (but undesirable) behavior from that PR. It is not a use-after-free. It's also possible for OptTree to fix this by manually holding a reference to internals capsule, we could make that easier with something like get_internals_capsule() from this PR.

One option is to go back to something more like the first commit on #5958, before I went into releasing internals itself. The original goal of that PR was to get rid of the leaks of the default_metaclass and instance_base.

XuehaiPan · 2026-01-23T04:01:45Z

I think releasing some part of the internal but keeping the internal leaked would be a safest choice.

rwgk · 2026-01-23T06:16:54Z

I think releasing some part of the internal but keeping the internal leaked would be a safest choice.

@XuehaiPan did you see my question above (https://github.com/pybind/pybind11/pull/5972#issuecomment-3786298624)?

Subinterpreter support added a lot of complexity; if we don’t capture these issues in unit tests, pybind11 will regress over time. I’m very skeptical of fixes that don’t start from a failing unit test.

rwgk · 2026-01-23T06:28:00Z

The original goal of that PR was to get rid of the leaks of the default_metaclass and instance_base.

Those leaks are currently dwarfed by CPython leaks documented here (pure‑C reproducer):

#5958 (comment).

Until those are fixed in CPython, or we find a workaround that sidesteps them, the savings from #5958 are marginal. If reverting #5958 is what it takes to ship v3.0.2, that seems like a reasonable short‑term compromise.

XuehaiPan · 2026-01-23T07:36:36Z

If reverting #5958 is what it takes to ship v3.0.2, that seems like a reasonable short‑term compromise.

This makes sense to me because most users only use the main interpreter. The leak will be freed anyway on program shutdown, while the correctness is more important.

I think releasing some part of the internal but keeping the internal leaked would be a safest choice.

@XuehaiPan did you see my question above (pybind/pybind11/pull/5972#issuecomment-3786298624)?

Subinterpreter support added a lot of complexity; if we don’t capture these issues in unit tests, pybind11 will regress over time. I’m very skeptical of fixes that don’t start from a failing unit test.

I am trying to add a repro. It is hard to find a consistent repro for the order bugs since there is no guarantee.

b-pass · 2026-01-23T22:50:54Z

Until those are fixed in CPython, or we find a workaround that sidesteps them, the savings from #5958 are marginal. If reverting #5958 is what it takes to ship v3.0.2, that seems like a reasonable short‑term compromise.

I think reverting is a better solution than this PR.

XuehaiPan · 2026-01-24T05:02:46Z

I am trying to add a repro. It is hard to find a consistent repro for the order bugs since there is no guarantee.

I added a test that fails on the master branch. But I don't think it can test all cases since there is no guarantee for the GC order.

This reverts commit c5ec1cf. This reverts commit 72c2e0a.

rwgk · 2026-01-24T19:04:34Z

This looks great at first glance. I'll start understanding/reviewing this now with LLM assistance.

rwgk · 2026-01-24T20:19:11Z

Below is the result of me discussing this PR with Cursor (GPT-5.2 Codex High).

I didn't study the code changes in this PR in detail.

In my mind, these are the most important goals:

Avoid an internal version bump before the v3.0.2 release.
Fix the segfault reported by @XuehaiPan before we make the v3.0.2 release.

High level approach: pare this PR back, to more-or-less roll back 5958, but keep as much of the new unit test as possible.

Based on that, Cursor came up with the Options A, B, C below.

@XuehaiPan @b-pass, could you please scrutinize Cursor's analysis, does it make sense?

Based on the analysis, Option A looks best to me. WDYT?

PR 5972 internals version bump considerations (v3.0.2)

Executive summary

For v3.0.2 you want no internals version bump and no new ABI breaks, but you also want to
eliminate the shutdown crash reported under #5958 (comment r2713504794). The only explicit bump in
PR 5972 is the macro change:

PYBIND11_INTERNALS_VERSION 11 -> 12

Nothing else in 5972 inherently requires a bump (no struct layout changes or ABI‑critical type
changes). The reason a bump is attractive is behavioral, not ABI: the shutdown fix in 5972
only protects types built with the new code. A bump prevents mixed old/new extensions from sharing
internals, which otherwise leaves the old types still vulnerable.

So the decision point is:

No bump means mixed v3.0.1 / v3.0.2 extensions will share internals and old types can still
hit the crash, unless you fully avoid destroying internals at shutdown (leak them).
Bump means the fix applies to all modules in a process, but violates the patch‑release goal.

Below are the minimal patch sets and recommendations.

Root cause (why the crash happens)

PR 5958 introduced internals_shutdown() which does pp->reset() when the internals capsule is
destroyed. During interpreter shutdown, GC order is not guaranteed, so tp_traverse/tp_clear
can run after pp->reset() and call py::cast. That recreates an empty internals and fails
type lookup, leading to an uncaught cast_error and termination. This is the crash reported in
comment r2713504794.

What in 5972 is related to the crash fix

Core behavioral change (the shutdown‑safety fix)

These changes allow pybind11 types to hold a reference to the internals capsules, keeping them
alive until those types are destroyed:

get_internals_capsule() / get_local_internals_capsule() helpers in
include/pybind11/detail/internals.h
type‑side reference holding (final branch uses weakref callbacks in
include/pybind11/pybind11.h)

This prevents internals_shutdown() from running before all pybind11 types are collected,
which avoids the crash for newly built extensions.

Non‑essential or unrelated changes

These are not required for the crash fix and add risk for a patch release:

PYBIND11_INTERNALS_VERSION bump to 12 (the only actual bump)
atomic_get_or_create_in_state_dict() default‑destructor sentinel (changes default leak vs
delete semantics)
GIL acquisition before Py_CLEAR in internals/local_internals dtors (safety improvement but not
part of the crash mechanism)
test refactors unrelated to the crash (tests/env.py helper move, test_multiple_interpreters)
doc updates (docs/advanced/classes.rst)

Mixed v3.0.1 / v3.0.2 (no bump) analysis

If you remove the bump, v3.0.1 and v3.0.2 modules will share internals (same capsule key).

Implications:

v3.0.2 types are protected (they hold capsule refs).
v3.0.1 types are not protected (they do not hold refs).
Therefore, the crash can still happen if only v3.0.1 types remain alive when the capsule
is destroyed, or if they outlive v3.0.2 types.
This is not a new bug, but it means the fix is partial in mixed environments.

Minimal patch sets for v3.0.2 (no bump)

Option A (safest for mixed environments, no bump)

Fully avoid destroying internals at shutdown (i.e., revert the risky part of 5958):

Make internals_shutdown() a no‑op (or skip pp->reset()).
Alternatively, change the capsule destructor for internals to leak the payload (no reset).
Add a regression test that exercises py::cast during shutdown and only asserts that the
subprocess exits cleanly (no capsule‑lifetime assertions).

Pros

Fixes the crash for all modules, including v3.0.1 binaries.
No internals bump.
Minimal behavioral surprise.

Cons

Reintroduces the leak fixed in 5958 (at least for shared internals).

This is the only way to guarantee safety in mixed 3.0.1/3.0.2 processes without a bump.

Option B (partial fix, no bump)

Keep the shutdown behavior from 5958, but add the capsule‑ref protection from 5972:

Keep

get_internals_capsule() / get_local_internals_capsule() helpers
type‑side ref holding (either weakref callbacks in pybind11.h or the earlier inc/dec in
class.h)
add/keep the new regression test that exercises py::cast during shutdown

Drop

internals version bump
atomic_get_or_create_in_state_dict() sentinel change
unrelated test/doc refactors

Pros

Fixes the crash for extensions rebuilt with v3.0.2.
Preserves the 5958 leak reduction for those modules.

Cons

Does not fix the crash for v3.0.1 modules in the same process.
Mixed builds remain risky; the fix is incomplete by design.

Option C (post‑release, with bump)

After v3.0.2, keep the shutdown‑safety fix and bump internals version so old modules do not
share internals with new ones. This makes the fix complete in any process.

Recommended course (for v3.0.2)

Given the requirement of full ABI compatibility and avoiding regressions for mixed builds, the
best tradeoff is:

Do not bump PYBIND11_INTERNALS_VERSION for v3.0.2.
Avoid destroying internals at shutdown (Option A).
- This ensures correctness for all modules, including v3.0.1 binaries.
- It matches the “leak is acceptable, correctness is critical” consensus in 5972 discussion.
Add a regression test that reproduces the shutdown GC path and asserts “no crash” rather
than capsule‑lifetime specifics. This keeps the test robust across future design changes.

Then, after v3.0.2, do the larger cleanup:

Apply the capsule‑ref fix (or another safe design).
Bump internals version for the future set of PRs already planned.
Add/restore tests that validate the stronger invariant under a bumped ABI.

Notes on test strategy

The current new test in 5972 uses a weakref callback to confirm the capsule is still alive at type
destruction time. That test is tightly coupled to the “hold capsule refs” design. If you choose
Option A (leak internals), a more resilient test is:

Run a subprocess that creates a pybind11 type with a custom tp_traverse/tp_clear that calls
py::cast, forms a self‑cycle, then exits.
Assert the subprocess exits cleanly (no crash).

This directly targets the failure mode without enforcing a particular lifetime mechanism.

Final answer to “what in 5972 requires an internals bump?”

Strictly speaking, nothing in 5972 requires a bump. There is no ABI‑level change to the
internals layout or the capsule payload. The bump is a risk‑management choice to prevent
old and new extensions from sharing internals when the shutdown fix only protects new types. If
you remove the bump, you must accept partial protection (only new modules) or disable
internals destruction at shutdown to keep mixed processes safe.

rwgk · 2026-01-25T20:25:29Z

This is a very complex PR. In an attempt to understand it, I worked with Cursor (GPT-5.2 Codex High) on a draft comprehensive PR description (below). This all makes sense to me high-level. @XuehaiPan @b-pass could you please check for correctness? @XuehaiPan could you please add the corrected version to the actual PR description?

I'll look in more detail after the comprehensive PR description is posted.

DRAFT comprehensive PR description

Summary

This PR changes pybind11’s interpreter‑shutdown behavior to avoid failures when tp_traverse /
tp_clear calls into py::cast after internals have been torn down. It does this by reverting
internals destruction, intentionally leaking internals across shutdown, and adding a guard
against re‑creating internals after destruction. It also adds a regression test that exercises
the shutdown GC path, plus a small test‑infra refactor and a documentation update.

Importantly, the internals version bump has been reverted; the default
PYBIND11_INTERNALS_VERSION remains 11 for ABI compatibility with v3.0.1.

Motivation / background

PR #5958 introduced internals_shutdown() that resets internals during interpreter finalization.
Because GC order is not guaranteed, tp_traverse / tp_clear can run after internals have
been reset and call py::cast, which recreates an empty internals and then fails type lookup.
This can throw pybind11::cast_error during shutdown and terminate the process. The goal here is
to make shutdown safe again (for v3.0.2) without bumping the internals ABI version.

Runtime behavior changes (net effect vs `upstream/master`)

1) Revert internals/local_internals destruction during shutdown

The explicit destructors added in #5958 are removed. internals and local_internals no longer
DECREF their cached PyTypes during finalization, so they revert to the previous “leak on shutdown”
behavior.

internals::~internals() → default
local_internals no longer stores istate or defines a destructor

This eliminates the immediate failure mode caused by pp->reset() before GC has finished.

2) Intentionally leak internals at interpreter shutdown

Internals capsule creation now passes dtor=nullptr to the state‑dict helper, which means the
capsule’s payload is intentionally leaked instead of destroyed at shutdown. This is explicitly
documented in comments to explain why we prefer leaking to early destruction.

3) Re‑entrancy detection when (re)creating internals

A new create_pp_content_once() path tracks which pp pointers have already created internals
content. If creation is attempted again for the same interpreter (e.g., due to late shutdown code
calling py::cast), it triggers a hard failure (pybind11_fail) with a diagnostic message.

This provides explicit detection of the failure mode that used to silently recreate empty
internals, so the behavior is not “accidentally” masked.

4) `atomic_get_or_create_in_state_dict()` semantics refined

The helper now distinguishes between:

default: use delete when the interpreter shuts down
nullptr: explicitly leak
function: use custom destructor

This enables the explicit “leak internals” behavior without changing default behavior for other
callers.

5) New internals capsule helpers

New helpers return the internals capsules (global and local) and the local‑internals key. These
are primarily used by tests to assert capsule lifetime or to add future runtime protections.

Test changes

New shutdown‑path regression test (via existing test module)

tests/test_custom_type_setup.cpp is refactored:

OwnsPythonObjects is replaced with ContainerOwnsPythonObjects
tp_traverse / tp_clear now iterate a vector of py::object
adds add_gc_checkers_with_weakrefs() that installs weakref callbacks asserting that the
internals capsules remain alive during object finalization

tests/test_custom_type_setup.py adds a new test:

runs in a subprocess
creates a self‑cycle
installs weakref GC checkers
verifies that shutdown GC does not destroy internals before the object is collected

Test infrastructure refactor

A shared check_script_success_in_subprocess() helper is added in tests/env.py.
tests/test_multiple_interpreters.py now uses this helper and adds env to sys.path in the
subprocess preamble.

Documentation update

The docs/advanced/classes.rst example is updated to match the new container‑based custom type
used in the tests.

Notes

The new re‑entrancy guard (pybind11_fail) introduces a hard‑failure path if internals are
ever recreated after destruction for the same interpreter. That is a deliberate behavioral change.
The shutdown fix is implemented by leaking internals, which trades memory for safety. This
aligns with the intended “correctness over leaks” policy for the v3.0.2 patch release.

rwgk

A couple very easy suggestions for changes to comments.

tests/test_custom_type_setup.cpp

tests/test_custom_type_setup.py

include/pybind11/detail/internals.h

XuehaiPan · 2026-01-27T11:11:18Z

@rwgk @b-pass This PR is ready for review. The PR description has been updated. My downstream tests have also passed.

b-pass · 2026-01-27T13:45:03Z

include/pybind11/detail/internals.h

+        // Detect re-creation of internals after destruction during interpreter shutdown.
+        // If pybind11 code (e.g., tp_traverse/tp_clear calling py::cast) runs after internals have
+        // been destroyed, a new empty internals would be created, causing type lookup failures.
+        // See also get_or_create_pp_in_state_dict() comments.


This PR reverted the "re-creation during shutdown" scenario, so why is this code necessary now?

I think there are two reasons:

Add an explicit check to prevent potential bugs in the future. Maybe we can eventually not leak the internals.

The reverted code still creates the internals raw pointers multiple times (pp->reset(new InternalsType())). This PR adds a lock to ensure consistency.

I don't think this code should have this unordered map and the associated book keeping around for an unreachable failure condition. At least make it debug only, or have its own ifdef, or completely remove it.

The overhead is very small, and ideally, it should only run once. I think it is acceptable.

an unreachable failure condition

We do run into this here:

https://github.com/pybind/pybind11/actions/runs/21558860079/job/62119622188?pr=5979#step:12:1536

We do run into this here:

I'm glad you understand this already! (I didn't)

@XuehaiPan do you have ideas how we should take care of that failure? This PR, or a follow-on? — We definitely need to bring back the commit I backed out (91189c9).

The concurrent.interpreters.NotShareableError: func not shareable is unrelated. And it is unexpected because the func function is not using any global variables (it uses local imports). It should be shareable between interpreters.

UPDATE: the root cause is:

E Exception: Traceback (most recent call last): E File "<frozen importlib._bootstrap>", line 1371, in _find_and_load E File "<frozen importlib._bootstrap>", line 1333, in _find_and_load_unlocked E File "<frozen importlib._bootstrap>", line 1267, in _find_spec E File "<frozen importlib._bootstrap_external>", line 1292, in find_spec E File "<frozen importlib._bootstrap_external>", line 1266, in _get_spec E File "<frozen importlib._bootstrap_external>", line 1369, in find_spec E File "<frozen importlib._bootstrap_external>", line 1412, in _fill_cache E BlockingIOError: [Errno 11] Resource temporarily unavailable: '/opt/python/cp314-cp314t/lib/python3.14t/concurrent' E E The above exception was the direct cause of the following exception: E E concurrent.interpreters.NotShareableError: object could not be unpickled

So it looks like it's flaky, at best. I'll try reruns under 5850 to see if I can get it to pass with enough trials.

It worked only on the 10th attempt!

https://github.com/pybind/pybind11/actions/runs/21580597316/job/62255061573?pr=5850

@XuehaiPan @b-pass we need to handle this somehow, it'll be super distracting. What should we do? Create an issue and add an xfail pointing to it?

I cannot reproduce this. Does this only fail with C++11? If so, we can add a skipif.

When I tried reproducing before, with C++20, I saw another kind of failure, and only when running test_multiple_interpreters.py in isolation.

I tried again just now, with C++11, and it's the same behavior. See below, JIC it's useful somehow.

I have to give up for now; I don't have a lot of spare time during the week.

Probably, if I had the free bandwidth, I'd try to set up a Manylinux container, identically to what we have in the CI.

( cd /wrk/forked/pybind11/tests && PYTHONPATH=/wrk/bld/pybind11_gcc_v3.14.2_df793163d58_freethreaded/lib /wrk/bld/pybind11_gcc_v3.14.2_df793163d58_freethreaded/TestVenv/bin/python3 -m pytest -v test_multiple_interpreters.py ) ============================================================================= test session starts ============================================================================== platform linux -- Python 3.14.2, pytest-9.0.2, pluggy-1.6.0 -- /wrk/bld/pybind11_gcc_v3.14.2_df793163d58_freethreaded/TestVenv/bin/python3 cachedir: .pytest_cache installed packages of interest: build==1.4.0 numpy==2.4.2 scipy==1.17.0 C++ Info: 13.3.0 C++11 __pybind11_internals_v11_system_libstdcpp_gxx_abi_1xxx_use_cxx11_abi_1__ PYBIND11_SIMPLE_GIL_MANAGEMENT=False free-threaded Python build rootdir: /wrk/forked/pybind11/tests configfile: pytest.ini plugins: timeout-2.4.0, xdist-3.8.0 collected 7 items test_multiple_interpreters.py::test_independent_subinterpreters PASSED [ 14%] test_multiple_interpreters.py::test_independent_subinterpreters_modern PASSED [ 28%] test_multiple_interpreters.py::test_dependent_subinterpreters FAILED [ 42%] test_multiple_interpreters.py::test_import_module_with_singleton_per_interpreter PASSED [ 57%] test_multiple_interpreters.py::test_import_in_subinterpreter_after_main PASSED [ 71%] test_multiple_interpreters.py::test_import_in_subinterpreter_before_main PASSED [ 85%] test_multiple_interpreters.py::test_import_in_subinterpreter_concurrently PASSED [100%] =================================================================================== FAILURES =================================================================================== ________________________________________________________________________ test_dependent_subinterpreters ________________________________________________________________________ @pytest.mark.skipif( sys.platform.startswith("emscripten"), reason="Requires loadable modules" ) def test_dependent_subinterpreters(): """Makes sure the internals object differs across subinterpreters""" sys.path.insert(0, os.path.dirname(pybind11_tests.__file__)) run_string, create = get_interpreters(modern=False) > import mod_shared_interpreter_gil as m E RuntimeWarning: The global interpreter lock (GIL) has been enabled to load module 'mod_shared_interpreter_gil', which has not declared that it can run safely without the GIL. To override this behavior and keep the GIL disabled (at your own risk), run with PYTHON_GIL=0 or -Xgil=0. create = <function get_interpreters.<locals>.create at 0x4f3e3193a80> run_string = <function get_interpreters.<locals>.run_string at 0x4f3e3193c00> test_multiple_interpreters.py:193: RuntimeWarning =========================================================================== short test summary info ============================================================================ FAILED test_multiple_interpreters.py::test_dependent_subinterpreters - RuntimeWarning: The global interpreter lock (GIL) has been enabled to load module 'mod_shared_interpreter_gil', which has not declared that it can run safely without the G... ========================================================================= 1 failed, 6 passed in 19.00s ========================================================================= ERROR: completed_process.returncode=1

I marked the test as flaky on musllinux in:

Add fallback implementation of PyCriticalSection_BeginMutex for Python 3.13t #5981

https://github.com/pybind/pybind11/pull/5981/changes#diff-a74aadc300d8c6861a71f50412018bd8c9be4df2a83641c674c6fbc4f1904c91R411

The test passes on manylinux but is flaky on musllinux.

This reverts commit f3197de. After pybind#5972 is/was merged, tests should pass (already tested under pybind#5980). See also pybind#5972 (comment)

* Fix race condition with py::make_key_iterator in free threading The creation of the iterator class needs to be synchronized. * style: pre-commit fixes * Use PyCriticalSection_BeginMutex instead of recursive mutex * style: pre-commit fixes * Make pycritical_section non-copyable and non-movable The pycritical_section class is a RAII wrapper that manages a Python critical section lifecycle: - Acquires the critical section in the constructor via PyCriticalSection_BeginMutex - Releases it in the destructor via PyCriticalSection_End - Holds a reference to a pymutex Allowing copy or move operations would be dangerous: 1. Copy: Both the original and copied objects would call PyCriticalSection_End on the same PyCriticalSection object in their destructors, leading to double-unlock and undefined behavior. 2. Move: The moved-from object's destructor would still run and attempt to end the critical section, while the moved-to object would also try to end it, again causing double-unlock. This follows the same pattern used by other RAII lock guards in the codebase, such as gil_scoped_acquire and gil_scoped_release, which also explicitly delete copy/move operations to prevent similar issues. By explicitly deleting these operations, we prevent accidental misuse and ensure the critical section is properly managed by a single RAII object throughout its lifetime. * Drop Python 3.13t support from CI Python 3.13t was experimental, while Python 3.14t is not. This PR uses PyCriticalSection_BeginMutex which is only available in Python 3.14+, making Python 3.13t incompatible with the changes. Removed all Python 3.13t CI jobs: - ubuntu-latest, 3.13t (standard-large matrix) - macos-15-intel, 3.13t (standard-large matrix) - windows-latest, 3.13t (standard-large matrix) - manylinux job testing 3.13t This aligns with the decision to drop Python 3.13t support as discussed in PR #5971. * Add Python 3.13 (default) replacement jobs for removed 3.13t jobs After removing Python 3.13t support (incompatible with PyCriticalSection_BeginMutex which requires Python 3.14+), we're adding replacement jobs using Python 3.13 (default) to maintain test coverage in key dimensions: 1. ubuntu-latest, Python 3.13: C++20 + DISABLE_HANDLE_TYPE_NAME_DEFAULT_IMPLEMENTATION - Replaces: ubuntu-latest, 3.13t with same config - Maintains coverage for this specific configuration combination 2. macos-15-intel, Python 3.13: C++11 - Replaces: macos-15-intel, 3.13t with same config - Maintains macOS coverage for Python 3.13 3. manylinux (musllinux), Python 3.13: GIL testing - Replaces: manylinux, 3.13t job - Maintains manylinux/musllinux container testing coverage These additions are proposed to get feedback on which jobs should be kept to maintain appropriate test coverage without the experimental 3.13t builds. * ci: run in free-threading mode a bit more on 3.14 * Revert "ci: run in free-threading mode a bit more on 3.14" This reverts commit 91189c9. Reason: #5971 (comment) * Reapply "ci: run in free-threading mode a bit more on 3.14" This reverts commit f3197de. After #5972 is/was merged, tests should pass (already tested under #5980). See also #5972 (comment) --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ralf W. Grosse-Kunstleve <rgrossekunst@nvidia.com> Co-authored-by: Henry Schreiner <HenrySchreinerIII@gmail.com> Co-authored-by: Ralf W. Grosse-Kunstleve <rwgkio@gmail.com>

XuehaiPan added 4 commits January 22, 2026 21:26

Bump internals version

33fb4a6

Prevent internals destruction before all pybind11 types are destroyed

a4a6a1e

Merge remote-tracking branch 'upstream/master' into fix-segfault

6d8fa8a

Use Py_XINCREF and Py_XDECREF

1d49006

XuehaiPan marked this pull request as draft January 22, 2026 17:38

XuehaiPan added 6 commits January 23, 2026 02:08

Hold GIL before decref

b147430

Use weakrefs

05576f1

Remove unused code

740f693

Move code location

d9227ce

Move code location

7c5d505

Move code location

436d812

XuehaiPan marked this pull request as ready for review January 22, 2026 19:17

Try add tests

ce9ca7f

XuehaiPan added 6 commits January 24, 2026 16:28

Fix PYTHONPATH

fed1749

Fix PYTHONPATH

a407438

Skip tests for subprocess

3df427c

Revert to leak internals

72c2e0a

Revert to leak internals

c5ec1cf

Revert "Revert to leak internals"

8f25a25

This reverts commit c5ec1cf. This reverts commit 72c2e0a.

Revert internals version bump

97e12d2

rwgk reviewed Jan 25, 2026

View reviewed changes

tests/test_custom_type_setup.cpp Show resolved Hide resolved

tests/test_custom_type_setup.py Outdated Show resolved Hide resolved

XuehaiPan added 5 commits January 26, 2026 11:57

Update comments

33ffa8e

Update lock scope

708ca55

Use original pointer type for Windows

85dc7e6

Change hard error to warning

404457e

Update lock scope

51a70ab

rwgk reviewed Jan 26, 2026

View reviewed changes

include/pybind11/detail/internals.h Outdated Show resolved Hide resolved

XuehaiPan added 7 commits January 27, 2026 01:05

Update lock scope to resolve deadlock

5f96327

Remove scope release of GIL

40731b7

Update comments

aa1767c

Lock pp on reset

dea5660

Mark content created after assignment

691241a

Update comments

552f8b0

Simplify implementation

79a80d2

XuehaiPan requested a review from rwgk January 27, 2026 10:41

Update lock scope when delete unique_ptr

56926f6

b-pass reviewed Jan 27, 2026

View reviewed changes

XuehaiPan requested a review from b-pass January 28, 2026 17:14

This was referenced Feb 1, 2026

For testing only: Combined #5971 and #5972 #5979

Closed

Fix race condition with py::make_key_iterator in free threading #5971

Merged

Merge branch 'master' into fix-segfault

6f3014a

b-pass approved these changes Feb 1, 2026

View reviewed changes

rwgk merged commit e7754de into pybind:master Feb 2, 2026
87 checks passed

github-actions bot added the needs changelog Possibly needs a changelog entry label Feb 2, 2026

rwgk mentioned this pull request Feb 2, 2026

For testing only: Combined #5971 and #5972 plus 91189c9 (i.e. "more 3.14t") #5980

Open

XuehaiPan deleted the fix-segfault branch February 2, 2026 08:30

Conversation

XuehaiPan commented Jan 22, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Background

Solution

Changes

Notes

Suggested changelog entry:

Uh oh!

rwgk commented Jan 22, 2026

Uh oh!

b-pass commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

XuehaiPan commented Jan 23, 2026

Uh oh!

rwgk commented Jan 23, 2026

Uh oh!

rwgk commented Jan 23, 2026

Uh oh!

XuehaiPan commented Jan 23, 2026

Uh oh!

b-pass commented Jan 23, 2026

Uh oh!

XuehaiPan commented Jan 24, 2026

Uh oh!

rwgk commented Jan 24, 2026

Uh oh!

rwgk commented Jan 24, 2026

PR 5972 internals version bump considerations (v3.0.2)

Executive summary

Root cause (why the crash happens)

What in 5972 is related to the crash fix

Core behavioral change (the shutdown‑safety fix)

Non‑essential or unrelated changes

Mixed v3.0.1 / v3.0.2 (no bump) analysis

Minimal patch sets for v3.0.2 (no bump)

Option A (safest for mixed environments, no bump)

Option B (partial fix, no bump)

Option C (post‑release, with bump)

Recommended course (for v3.0.2)

Notes on test strategy

Final answer to “what in 5972 requires an internals bump?”

Uh oh!

rwgk commented Jan 25, 2026

DRAFT comprehensive PR description

Summary

Motivation / background

Runtime behavior changes (net effect vs upstream/master)

1) Revert internals/local_internals destruction during shutdown

2) Intentionally leak internals at interpreter shutdown

3) Re‑entrancy detection when (re)creating internals

4) atomic_get_or_create_in_state_dict() semantics refined

5) New internals capsule helpers

Test changes

New shutdown‑path regression test (via existing test module)

Test infrastructure refactor

Documentation update

Notes

Uh oh!

rwgk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

XuehaiPan commented Jan 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

XuehaiPan Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

XuehaiPan commented Jan 22, 2026 •

edited by github-actions bot

Loading

b-pass commented Jan 22, 2026 •

edited

Loading

Runtime behavior changes (net effect vs `upstream/master`)

4) `atomic_get_or_create_in_state_dict()` semantics refined

XuehaiPan Feb 2, 2026 •

edited

Loading

XuehaiPan Feb 2, 2026 •

edited

Loading