-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Context
Commit 505e20c introduced a workaround for an MSVC codegen bug where the compiler stores the coroutine_handle<> return value from await_suspend on the coroutine frame via a hidden __$ReturnUdt$ variable. After await_suspend publishes the coroutine handle to another thread (e.g. via IOCP), that thread can resume/destroy the frame before __resume reads the handle back for the symmetric transfer tail-call, causing a use-after-free.
The workaround is applied in three transform_awaiter::await_suspend sites:
task.hpp(line 207)when_all.hpp(line 218)when_any.hpp(line 353)
On MSVC, when the inner awaitable's await_suspend returns coroutine_handle<>, the workaround calls .resume() on the machine stack instead of returning the handle for symmetric transfer.
Impact Analysis
1. Stack depth on MSVC
The workaround converts symmetric transfer (tail-call, O(1) stack) into a regular .resume() call, which grows the stack. In practice this is mitigated because:
- IOCP-based awaitables return
noop_coroutine()after posting to the OS, so.resume()is a no-op when_all_launcher/when_any_launcherreturnnoop_coroutine()after spawning childrenimmediate<T>always returnsnoop_coroutine()
However, some awaitables return real handles through transform_awaiter:
task<T>: returnsh_(the task's own coroutine handle) — symmetric transfer to begin executionrun_awaitable/run_awaitable_ex: returns a task handleasync_event::wait_awaiterandasync_mutex::lock_awaiter: may return the caller handlehwhen already canceled
For task<T> specifically, a chain of co_await task_a; co_await task_b; ... would add a stack frame per task on MSVC instead of tail-calling. Deep chains could theoretically stack-overflow, though typical usage likely stays within limits.
2. final_suspend awaiters are NOT covered
The workaround only applies to user-provided IO awaitables flowing through transform_awaiter. Several final_suspend awaiters also return coroutine_handle<> and may be subject to the same MSVC bug:
when_all_runner::promise_type::final_suspend (when_all.hpp:164)
std::coroutine_handle<> await_suspend(std::coroutine_handle<> h) noexcept
{
auto* state = p_->state_;
auto* counter = &state->remaining_count_;
auto* caller_env = state->caller_env_;
auto cont = state->continuation_;
h.destroy(); // destroys own frame
auto remaining = counter->fetch_sub(1, std::memory_order_acq_rel);
if(remaining == 1)
return caller_env->executor.dispatch(cont);
return std::noop_coroutine();
}This calls h.destroy() before returning. If MSVC stores the return value in __$ReturnUdt$ on the (now-destroyed) frame, this is a use-after-free. The same pattern appears in when_any_runner::final_suspend (when_any.hpp:313).
task::promise_type::final_suspend (task.hpp:168)
std::coroutine_handle<> await_suspend(std::coroutine_handle<>) const noexcept
{
return p_->continuation();
}No h.destroy(), no cross-thread publishing. The frame is still alive when __$ReturnUdt$ is read back. Likely safe, but should be verified.
dispatch_trampoline::promise_type::final_suspend (run.hpp:99)
std::coroutine_handle<> await_suspend(std::coroutine_handle<>) noexcept
{
return p_->caller_ex_.dispatch(p_->parent_);
}No h.destroy(), but dispatch() may post to another thread and return noop_coroutine(). If the other thread resumes the parent which then destroys this frame before __$ReturnUdt$ is read — same race condition. Needs investigation.
3. Executor dispatch() semantics matter
The dispatch() contract has two modes:
- Same thread: returns the handle
hfor symmetric transfer (no cross-thread race) - Different thread: calls
post(h)and returnsnoop_coroutine()— the posted handle could potentially resume and destroy the caller's frame before the return value is consumed
For thread_pool::dispatch(), it always posts and returns noop_coroutine(). For strand::dispatch(), it may return h directly if already on the strand. The concern is specifically when dispatch() posts cross-thread — does the window between the post() and the return of noop_coroutine() allow the posted thread to destroy the frame?
4. if constexpr branch for non-coroutine_handle<> returns
The workaround correctly falls through to normal return for void and bool return types, which don't trigger the MSVC bug. This is sound.
Questions to investigate
-
Are the
when_all_runner/when_any_runnerfinal_suspendawaiters vulnerable? They destroy their own frame viah.destroy()and then return acoroutine_handle<>. If MSVC writes the return value to the destroyed frame, this is UB regardless of threading. -
Is
dispatch_trampoline::final_suspendvulnerable? It callsdispatch()which may cross-thread-post before returning. -
What is the practical stack depth limit? For direct
task<T>chaining throughtransform_awaiter, how deep can a chain go before stack overflow on MSVC? -
Should the workaround be generalized? A utility like
msvc_symmetric_transfer(handle)could be applied uniformly to allawait_suspendsites returningcoroutine_handle<>. -
Is there an MSVC version where this is fixed? If so, the
#ifdefcould be narrowed to_MSC_VER < NNNN.