Fix coroutine frame leaks during io_context shutdown (#159)#173
Fix coroutine frame leaks during io_context shutdown (#159)#173mvandeberg wants to merge 3 commits intocppalliance:developfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughExplicitly destroy coroutine handles during shutdown/post paths and remove per-scheduler Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
An automated preview of the documentation is available at https://173.corosio.prtest3.cppalliance.org/index.html If more commits are pushed to the pull request, the docs will rebuild at the same URL. 2026-02-25 17:51:01 UTC |
|
GCOVR code coverage report https://173.corosio.prtest3.cppalliance.org/gcovr/index.html Build time: 2026-02-25 17:57:59 UTC |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@include/boost/corosio/detail/timer_service.hpp`:
- Around line 752-761: do_complete's branch that checks for "!owner" is now
unreachable because completion_op defines void destroy() override which bypasses
scheduler_op::destroy() and thus never calls func_(nullptr, ...); update the
code by marking the "!owner" branch in do_complete as unreachable (or remove it)
and add a clear comment on completion_op::destroy() explaining that it
intentionally does not call work_finished() because destroy() is only invoked
during scheduler shutdown (e.g., scheduler_op_queue destructor or explicit
cleanup) where work_finished() has already been called for each waiter in the
heap loop—refer to completion_op::destroy(), do_complete,
scheduler_op::destroy(), waiter_node and work_finished() in the comment to
prevent future double-decrement mistakes.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
test/unit/io_context.cppis excluded by!**/test/**test/unit/timer.cppis excluded by!**/test/**
📒 Files selected for processing (5)
include/boost/corosio/detail/timer_service.hppinclude/boost/corosio/native/detail/epoll/epoll_scheduler.hppinclude/boost/corosio/native/detail/iocp/win_scheduler.hppinclude/boost/corosio/native/detail/kqueue/kqueue_scheduler.hppinclude/boost/corosio/native/detail/select/select_scheduler.hpp
ed96045 to
b7b7248
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #173 +/- ##
===========================================
- Coverage 82.02% 76.64% -5.39%
===========================================
Files 70 84 +14
Lines 5876 8793 +2917
Branches 0 1835 +1835
===========================================
+ Hits 4820 6739 +1919
- Misses 1056 1629 +573
- Partials 0 425 +425
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 76 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
b7b7248 to
21e9ecb
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
include/boost/corosio/detail/timer_service.hpp (1)
752-768:do_complete's!ownerbranch is now dead code — previous review comment still unaddressed.
completion_opnow overridesdestroy(), soscheduler_op::destroy()(which callsfunc_(nullptr, ...)) is never invoked on acompletion_op. Theif (!owner) return;at line 728 is therefore unreachable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@include/boost/corosio/detail/timer_service.hpp` around lines 752 - 768, The if (!owner) return; branch in do_complete is unreachable because waiter_node::completion_op overrides destroy(), preventing scheduler_op::destroy() from invoking func_(nullptr,...); remove the dead branch from do_complete (or, if intended, adjust ownership flow so scheduler_op::destroy() can call completion_op::func_) and ensure any work_started/work_finished balancing remains correct when waiter_node::completion_op::destroy() runs; update references to owner and related cleanup paths to reflect that completion_op::destroy() handles the shutdown path.
🧹 Nitpick comments (1)
include/boost/corosio/detail/scheduler_op.hpp (1)
125-137:scheduler_op_queuedestructor silently skipswork_finished()forowns_work_=trueops.
~scheduler_op_queuecallsh->destroy()without consultingowns_work(). For ops whereowns_work_=trueanddestroy()doesn't self-callwork_finished()(e.g.,post_handler), any such op still in ascheduler_op_queueat destruction time will leak its work count. The current architecture avoids this by always draining private queues intocompleted_ops_before the guard destructs, but this is a fragile implicit contract. A clarifying comment on the destructor would prevent future regressions.💬 Suggested comment on destructor
~scheduler_op_queue() { + // Calls destroy() on each remaining op. Ops with owns_work_=true + // must self-call work_finished() inside destroy(), or the caller + // must guarantee this queue is only destructed after the scheduler + // shutdown drain loop has already run. while (auto* h = q_.pop()) h->destroy(); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@include/boost/corosio/detail/scheduler_op.hpp` around lines 125 - 137, Destructor of scheduler_op_queue currently calls h->destroy() without checking scheduler_op::owns_work_, which can silently skip work_finished() for ops (e.g., post_handler) that do not self-call work_finished(), risking leaked work counts; update the destructor's comment to explicitly document that all ops remaining in the queue must have owns_work_==false or must have their destroy() call work_finished(), and mention the relevant symbols: scheduler_op_queue::~scheduler_op_queue, scheduler_op::owns_work_, handler_base::destroy (h->destroy()), work_finished(), and post_handler as an example—clarify the invariant and that callers/implementations must ensure work_finished() is called for owns_work_=true ops or that queues are drained into completed_ops_ before destruction.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@include/boost/corosio/detail/timer_service.hpp`:
- Around line 752-768: The if (!owner) return; branch in do_complete is
unreachable because waiter_node::completion_op overrides destroy(), preventing
scheduler_op::destroy() from invoking func_(nullptr,...); remove the dead branch
from do_complete (or, if intended, adjust ownership flow so
scheduler_op::destroy() can call completion_op::func_) and ensure any
work_started/work_finished balancing remains correct when
waiter_node::completion_op::destroy() runs; update references to owner and
related cleanup paths to reflect that completion_op::destroy() handles the
shutdown path.
---
Nitpick comments:
In `@include/boost/corosio/detail/scheduler_op.hpp`:
- Around line 125-137: Destructor of scheduler_op_queue currently calls
h->destroy() without checking scheduler_op::owns_work_, which can silently skip
work_finished() for ops (e.g., post_handler) that do not self-call
work_finished(), risking leaked work counts; update the destructor's comment to
explicitly document that all ops remaining in the queue must have
owns_work_==false or must have their destroy() call work_finished(), and mention
the relevant symbols: scheduler_op_queue::~scheduler_op_queue,
scheduler_op::owns_work_, handler_base::destroy (h->destroy()), work_finished(),
and post_handler as an example—clarify the invariant and that
callers/implementations must ensure work_finished() is called for
owns_work_=true ops or that queues are drained into completed_ops_ before
destruction.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
test/unit/io_context.cppis excluded by!**/test/**test/unit/timer.cppis excluded by!**/test/**
📒 Files selected for processing (8)
include/boost/corosio/detail/scheduler_op.hppinclude/boost/corosio/detail/timer_service.hppinclude/boost/corosio/native/detail/epoll/epoll_op.hppinclude/boost/corosio/native/detail/epoll/epoll_scheduler.hppinclude/boost/corosio/native/detail/iocp/win_scheduler.hppinclude/boost/corosio/native/detail/kqueue/kqueue_op.hppinclude/boost/corosio/native/detail/kqueue/kqueue_scheduler.hppinclude/boost/corosio/native/detail/select/select_scheduler.hpp
🚧 Files skipped from review as they are similar to previous changes (1)
- include/boost/corosio/native/detail/iocp/win_scheduler.hpp
Coroutine frames were leaked during shutdown because destroy handlers either skipped h.destroy() or were no-ops. post_handler::destroy() now destroys the coroutine frame in all backends. completion_op::destroy() cleans up the waiter and destroys the coroutine frame. timer_service::shutdown() destroys frames instead of nulling handles. Remove the outstanding_work_ hard-reset to zero during shutdown, matching Asio's approach of abandoning the counter. Remove the unused shutdown_ flag (written but never read in corosio). Drain loops now simply call destroy() on remaining ops. Add regression tests for posted coroutine shutdown, timer waiter shutdown, and abrupt stop with pending ops.
21e9ecb to
de3e3bf
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
include/boost/corosio/native/detail/iocp/win_scheduler.hpp (1)
192-229:⚠️ Potential issue | 🔴 Critical
shutdown()loop spins indefinitely when posted handlers were in-flight —outstanding_work_is never decremented in the destroy path.
post()incrementsoutstanding_work_via::InterlockedIncrement(). In the normal execution path,do_one()callswork_finished()afterop->complete()to decrement it. However, inshutdown(), each dequeued handler is destroyed viah->destroy()→post_handler::do_complete(nullptr, ...)→coro.destroy(). This destroy path never callswork_finished(), leavingoutstanding_work_undecremented. The loop-exit conditionoutstanding_work_ == 0will never fire, causing indefinite spinning.Call
work_finished()after eachdestroy()in both drain loops:Fix: decrement outstanding_work_ after destroying handlers
while (auto* h = ops.pop()) + { h->destroy(); + work_finished(); + } ... if (overlapped) { if (key == key_posted) { auto* op = reinterpret_cast<scheduler_op*>(overlapped); op->destroy(); + work_finished(); } else { auto* op = overlapped_to_op(overlapped); op->destroy(); + work_finished(); } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@include/boost/corosio/native/detail/iocp/win_scheduler.hpp` around lines 192 - 229, The shutdown() loop never decrements outstanding_work_ for handlers destroyed directly, causing an infinite spin; after every h->destroy() in the first drain loop and after every op->destroy()/reinterpret_cast path in the GetQueuedCompletionStatus handling, call work_finished() to decrement outstanding_work_ (same place you rely on do_one()/post() to pair with work_finished()); update the shutdown() implementation to invoke work_finished() immediately after each destroy() so outstanding_work_ can reach zero and the loop can exit (refer to symbols: shutdown(), timers_, op_queue, completed_ops_, dispatch_mutex_, h->destroy(), overlapped_to_op(), GetQueuedCompletionStatus, op->destroy(), post(), do_one(), work_finished(), post_handler::do_complete).
🧹 Nitpick comments (1)
include/boost/corosio/native/detail/iocp/win_scheduler.hpp (1)
239-248: Document the bounded-destruction invariant that justifies the synchronouscoro.destroy()call.The PR restores
coro.destroy()in the shutdown path precisely because it asserts the destruction chain is bounded to at most two levels deep (task → trampoline → noop_coroutine). That invariant is load-bearing — it's the sole reason re-entrant destruction no longer risks a stack overflow — but there is no comment, assertion, orstatic_assertin the code to capture it. A future refactor that introduces a deeper coroutine chain could silently reintroduce the overflow.if (!owner) { + // Destruction chain is bounded: task → trampoline → noop_coroutine. + // No deeper re-entrancy is possible, so synchronous destroy is safe. auto coro = self->h_; delete self; coro.destroy(); return; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@include/boost/corosio/native/detail/iocp/win_scheduler.hpp` around lines 239 - 248, Add a clear comment and a runtime or compile-time check documenting the bounded-destruction invariant that justifies calling coro.destroy() synchronously in post_handler::do_complete; specifically, in the do_complete path where owner == nullptr and we cast to post_handler (post_handler, do_complete, coro.destroy()), state that the destruction chain is guaranteed to be at most two levels deep (task → trampoline → noop_coroutine) and either assert this invariant or otherwise validate it (e.g., a debug-time assert or an invariant flag) so future refactors cannot silently extend the chain depth and reintroduce re-entrant stack overflows.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@include/boost/corosio/detail/timer_service.hpp`:
- Around line 337-340: The shutdown code destroys waiter coroutine frames (using
h.destroy() and delete w) but never balances the increment done in
timer_service::implementation::wait(); update the destroy paths (the blocks
handling auto h = std::exchange(w->h_, {}); if (h) h.destroy(); delete w;) to
call the corresponding work_finished() on the timer_service::implementation
before destroying/deleting the waiter (w) so the outstanding work counters are
decremented; apply the same change to the other symmetric shutdown region
(around the 751-761 block) to ensure every wait() increment is paired with a
work_finished().
---
Outside diff comments:
In `@include/boost/corosio/native/detail/iocp/win_scheduler.hpp`:
- Around line 192-229: The shutdown() loop never decrements outstanding_work_
for handlers destroyed directly, causing an infinite spin; after every
h->destroy() in the first drain loop and after every
op->destroy()/reinterpret_cast path in the GetQueuedCompletionStatus handling,
call work_finished() to decrement outstanding_work_ (same place you rely on
do_one()/post() to pair with work_finished()); update the shutdown()
implementation to invoke work_finished() immediately after each destroy() so
outstanding_work_ can reach zero and the loop can exit (refer to symbols:
shutdown(), timers_, op_queue, completed_ops_, dispatch_mutex_, h->destroy(),
overlapped_to_op(), GetQueuedCompletionStatus, op->destroy(), post(), do_one(),
work_finished(), post_handler::do_complete).
---
Nitpick comments:
In `@include/boost/corosio/native/detail/iocp/win_scheduler.hpp`:
- Around line 239-248: Add a clear comment and a runtime or compile-time check
documenting the bounded-destruction invariant that justifies calling
coro.destroy() synchronously in post_handler::do_complete; specifically, in the
do_complete path where owner == nullptr and we cast to post_handler
(post_handler, do_complete, coro.destroy()), state that the destruction chain is
guaranteed to be at most two levels deep (task → trampoline → noop_coroutine)
and either assert this invariant or otherwise validate it (e.g., a debug-time
assert or an invariant flag) so future refactors cannot silently extend the
chain depth and reintroduce re-entrant stack overflows.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
test/unit/io_context.cppis excluded by!**/test/**test/unit/timer.cppis excluded by!**/test/**
📒 Files selected for processing (5)
include/boost/corosio/detail/timer_service.hppinclude/boost/corosio/native/detail/epoll/epoll_scheduler.hppinclude/boost/corosio/native/detail/iocp/win_scheduler.hppinclude/boost/corosio/native/detail/kqueue/kqueue_scheduler.hppinclude/boost/corosio/native/detail/select/select_scheduler.hpp
🚧 Files skipped from review as they are similar to previous changes (1)
- include/boost/corosio/native/detail/select/select_scheduler.hpp
| auto h = std::exchange(w->h_, {}); | ||
| if (h) | ||
| h.destroy(); | ||
| delete w; |
There was a problem hiding this comment.
Balance scheduler work accounting on shutdown destroy paths.
These paths destroy waiter coroutine frames but never call work_finished(), even though each waiter increments work in timer_service::implementation::wait() (Line 794). That leaves outstanding work counters imbalanced during shutdown teardown.
💡 Proposed fix
inline void
timer_service::shutdown()
{
timer_service_invalidate_cache();
+ auto& sched = get_scheduler();
// Cancel waiting timers still in the heap
for (auto& entry : heap_)
{
auto* impl = entry.timer_;
while (auto* w = impl->waiters_.pop_front())
{
w->stop_cb_.reset();
auto h = std::exchange(w->h_, {});
if (h)
h.destroy();
delete w;
+ sched.work_finished();
}
impl->heap_index_ = (std::numeric_limits<std::size_t>::max)();
delete impl;
}
@@
inline void
waiter_node::completion_op::destroy()
{
auto* w = waiter_;
w->stop_cb_.reset();
auto h = std::exchange(w->h_, {});
auto& sched = w->svc_->get_scheduler();
delete w;
if (h)
h.destroy();
+ sched.work_finished();
}Also applies to: 751-761
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@include/boost/corosio/detail/timer_service.hpp` around lines 337 - 340, The
shutdown code destroys waiter coroutine frames (using h.destroy() and delete w)
but never balances the increment done in timer_service::implementation::wait();
update the destroy paths (the blocks handling auto h = std::exchange(w->h_, {});
if (h) h.destroy(); delete w;) to call the corresponding work_finished() on the
timer_service::implementation before destroying/deleting the waiter (w) so the
outstanding work counters are decremented; apply the same change to the other
symmetric shutdown region (around the 751-761 block) to ensure every wait()
increment is paired with a work_finished().
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@include/boost/corosio/detail/timer_service.hpp`:
- Around line 337-339: Add an explicit `#include` <utility> to
include/boost/corosio/detail/timer_service.hpp because std::exchange is used
(e.g. where std::exchange(w->h_, {}) appears and also at the other use around
line 768); update the top-of-file include block so the header directly includes
<utility> instead of relying on transitive includes.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
test/unit/io_context.cppis excluded by!**/test/**test/unit/timer.cppis excluded by!**/test/**
📒 Files selected for processing (1)
include/boost/corosio/detail/timer_service.hpp
| auto h = std::exchange(w->h_, {}); | ||
| if (h) | ||
| h.destroy(); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify std::exchange usage sites in this header
echo "=== Checking for std::exchange usage ==="
rg -n 'std::exchange' include/boost/corosio/detail/timer_service.hpp
# Verify whether <utility> is explicitly included
echo ""
echo "=== Checking for `#include` <utility> ==="
rg -n '^#include <utility>$' include/boost/corosio/detail/timer_service.hpp
# Show all includes in the file to understand current state
echo ""
echo "=== All includes in the file ==="
rg -n '^#include' include/boost/corosio/detail/timer_service.hpp | head -20Repository: cppalliance/corosio
Length of output: 238
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Show all includes in the file to understand current state
echo "=== All includes in the file ==="
head -50 include/boost/corosio/detail/timer_service.hpp | rg -n '^#include'Repository: cppalliance/corosio
Length of output: 744
Add <utility> for std::exchange usage.
std::exchange is used at lines 337 and 768, but <utility> is not explicitly included. Avoid relying on transitive includes.
♻️ Proposed fix
`#include` <stop_token>
+#include <utility>
`#include` <vector>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| auto h = std::exchange(w->h_, {}); | |
| if (h) | |
| h.destroy(); | |
| `#include` <stop_token> | |
| `#include` <utility> | |
| `#include` <vector> |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@include/boost/corosio/detail/timer_service.hpp` around lines 337 - 339, Add
an explicit `#include` <utility> to include/boost/corosio/detail/timer_service.hpp
because std::exchange is used (e.g. where std::exchange(w->h_, {}) appears and
also at the other use around line 768); update the top-of-file include block so
the header directly includes <utility> instead of relying on transitive
includes.
Closes #159
Restore coroutine handle destruction in shutdown paths that was previously removed (3a29bea, 57c6600) to work around a stack overflow from re-entrant destruction. The destruction chain is now bounded (task → trampoline → noop_coroutine, at most 2 levels deep per run_async invocation), so the overflow no longer applies.
Three shutdown paths now properly destroy coroutine frames:
Includes regression tests that fail under ASan without the fix:
Summary by CodeRabbit