-
Notifications
You must be signed in to change notification settings - Fork 50
Fix libevwrapper atexit cleanup bug causing Python shutdown crashes #680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@copilot suggest more directions that might cause the issue. Also write a test that show case the problem, before implantation of any fix |
- Added LIBEV_SHUTDOWN_ANALYSIS.md with detailed root cause analysis - Documented 6 different crash scenarios that can occur - Added test_libevreactor_shutdown.py to demonstrate the bug - Tests show that atexit callback captures None instead of actual loop - Analysis explains why this causes crashes during Python shutdown Co-authored-by: fruch <340979+fruch@users.noreply.github.com>
I've created a comprehensive analysis in commit f56442c: Analysis Document:
Test File:
Root Cause: Line 231 in atexit.register(partial(_cleanup, _global_loop)) # _global_loop is None here!The Recommended Fix (minimal change): def _atexit_cleanup():
global _global_loop
if _global_loop is not None:
_cleanup(_global_loop)
atexit.register(_atexit_cleanup)This ensures cleanup looks up the current |
|
@copilot continue with implementation of the suggested fixes, one at a time |
…shes This implements the minimal fix for the libev atexit cleanup bug. Changes: - Replace atexit.register(partial(_cleanup, _global_loop)) with a wrapper function _atexit_cleanup() that looks up _global_loop at shutdown time - Remove unused 'partial' import from functools - Update tests to verify the fix works correctly The bug was that partial() captured _global_loop=None at import time, so cleanup always received None at shutdown instead of the actual LibevLoop instance. This prevented proper cleanup, leaving active callbacks that could crash during Python interpreter shutdown. The fix ensures _global_loop is looked up when atexit calls the cleanup, not when the callback is registered, so cleanup receives the actual loop instance and can properly shut down watchers and join the event loop thread. Co-authored-by: fruch <340979+fruch@users.noreply.github.com>
Implemented Solution 1: Fix atexit Registration in commit 8c90f05. Changes Made:
How it fixes the bug:
This is the minimal change approach (14 lines modified). The next step would be Solution 2 (add loop.stop() method in C extension) if you'd like additional robustness. |
This adds an explicit stop() method to the libev Loop that can break the event loop from any thread, providing more robust cleanup during shutdown. C Extension Changes (libevwrapper.c): - Add ev_async async_watcher field to libevwrapper_Loop struct - Add async_stop_cb() callback that calls ev_break(EVBREAK_ALL) - Add Loop_stop() method that sends async signal to break the loop - Initialize async_watcher in Loop_init() and start it - Clean up async_watcher in Loop_dealloc() - Add "stop" to Loop_methods array Python Changes (libevreactor.py): - Update _atexit_cleanup() to call loop.stop() before cleanup - This ensures the event loop breaks cleanly even if processing events - Wrapped in try/except to continue cleanup if stop fails Benefits: - Thread-safe loop stopping via libev's async mechanism - Prevents race conditions where callbacks fire during shutdown - Works in conjunction with Solution 1 for defense in depth Co-authored-by: fruch <340979+fruch@users.noreply.github.com>
Document the complete implementation of both solutions for the libev atexit cleanup bug, including code changes, benefits, and testing approach. Co-authored-by: fruch <340979+fruch@users.noreply.github.com>
Description
The
atexitcallback inlibevreactor.pywas registered with_global_loop=Noneat import time, causing it to receiveNoneduring shutdown instead of the actual loop instance. This prevented cleanup from running, leaving active libev callbacks that could execute during Python shutdown and crash.This PR implements two complementary solutions to fix the issue and prevent shutdown crashes.
Changes
Analysis and Documentation
LIBEV_SHUTDOWN_ANALYSIS.md: Comprehensive analysis document identifying the root cause and 6 different crash scenarios:
IMPLEMENTATION_SUMMARY.md: Complete documentation of implemented solutions, code changes, and testing approach
Solution 1: Fix atexit Registration (Minimal Change)
File:
cassandra/io/libevreactor.pyReplaced the problematic
atexit.register(partial(_cleanup, _global_loop))with a wrapper function that looks up_global_loopat shutdown time:partialimport from functoolsSolution 2: Add loop.stop() Method (Enhanced Robustness)
Files:
cassandra/io/libevwrapper.c,cassandra/io/libevreactor.pyAdded explicit loop stopping mechanism to the C extension:
C Extension Changes:
ev_async async_watcherfield tolibevwrapper_Loopstructasync_stop_cb()callback that callsev_break(EVBREAK_ALL)Loop_stop()method callable from PythonLoop_init()Loop_dealloc()Python Changes:
_atexit_cleanup()to callloop.stop()before cleanupTests
File:
tests/unit/io/test_libevreactor_shutdown.pytest_atexit_callback_uses_current_global_loop()- Verifies atexit handler is the wrapper function, not a partialtest_shutdown_cleanup_works_with_fix()- Subprocess test verifying proper cleanuptest_cleanup_with_fix_properly_shuts_down()- Verifies cleanup actually shuts down the loopRoot Cause
Line 231 in
libevreactor.py:The
partialfunction capturesNoneat import time. Later when_global_loopis set to aLibevLoopinstance, the atexit callback still hasNonecaptured, so cleanup never runs.How the Fixes Work Together
Pre-review checklist
./docs/source/.Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.