Skip to content

Analyze impact of MSVC symmetric transfer workaround in transform_awaiter #180

@sgerbino

Description

@sgerbino

Context

Commit 505e20c introduced a workaround for an MSVC codegen bug where the compiler stores the coroutine_handle<> return value from await_suspend on the coroutine frame via a hidden __$ReturnUdt$ variable. After await_suspend publishes the coroutine handle to another thread (e.g. via IOCP), that thread can resume/destroy the frame before __resume reads the handle back for the symmetric transfer tail-call, causing a use-after-free.

The workaround is applied in three transform_awaiter::await_suspend sites:

  • task.hpp (line 207)
  • when_all.hpp (line 218)
  • when_any.hpp (line 353)

On MSVC, when the inner awaitable's await_suspend returns coroutine_handle<>, the workaround calls .resume() on the machine stack instead of returning the handle for symmetric transfer.

Impact Analysis

1. Stack depth on MSVC

The workaround converts symmetric transfer (tail-call, O(1) stack) into a regular .resume() call, which grows the stack. In practice this is mitigated because:

  • IOCP-based awaitables return noop_coroutine() after posting to the OS, so .resume() is a no-op
  • when_all_launcher / when_any_launcher return noop_coroutine() after spawning children
  • immediate<T> always returns noop_coroutine()

However, some awaitables return real handles through transform_awaiter:

  • task<T>: returns h_ (the task's own coroutine handle) — symmetric transfer to begin execution
  • run_awaitable / run_awaitable_ex: returns a task handle
  • async_event::wait_awaiter and async_mutex::lock_awaiter: may return the caller handle h when already canceled

For task<T> specifically, a chain of co_await task_a; co_await task_b; ... would add a stack frame per task on MSVC instead of tail-calling. Deep chains could theoretically stack-overflow, though typical usage likely stays within limits.

2. final_suspend awaiters are NOT covered

The workaround only applies to user-provided IO awaitables flowing through transform_awaiter. Several final_suspend awaiters also return coroutine_handle<> and may be subject to the same MSVC bug:

when_all_runner::promise_type::final_suspend (when_all.hpp:164)

std::coroutine_handle<> await_suspend(std::coroutine_handle<> h) noexcept
{
    auto* state = p_->state_;
    auto* counter = &state->remaining_count_;
    auto* caller_env = state->caller_env_;
    auto cont = state->continuation_;
    h.destroy();  // destroys own frame
    auto remaining = counter->fetch_sub(1, std::memory_order_acq_rel);
    if(remaining == 1)
        return caller_env->executor.dispatch(cont);
    return std::noop_coroutine();
}

This calls h.destroy() before returning. If MSVC stores the return value in __$ReturnUdt$ on the (now-destroyed) frame, this is a use-after-free. The same pattern appears in when_any_runner::final_suspend (when_any.hpp:313).

task::promise_type::final_suspend (task.hpp:168)

std::coroutine_handle<> await_suspend(std::coroutine_handle<>) const noexcept
{
    return p_->continuation();
}

No h.destroy(), no cross-thread publishing. The frame is still alive when __$ReturnUdt$ is read back. Likely safe, but should be verified.

dispatch_trampoline::promise_type::final_suspend (run.hpp:99)

std::coroutine_handle<> await_suspend(std::coroutine_handle<>) noexcept
{
    return p_->caller_ex_.dispatch(p_->parent_);
}

No h.destroy(), but dispatch() may post to another thread and return noop_coroutine(). If the other thread resumes the parent which then destroys this frame before __$ReturnUdt$ is read — same race condition. Needs investigation.

3. Executor dispatch() semantics matter

The dispatch() contract has two modes:

  • Same thread: returns the handle h for symmetric transfer (no cross-thread race)
  • Different thread: calls post(h) and returns noop_coroutine() — the posted handle could potentially resume and destroy the caller's frame before the return value is consumed

For thread_pool::dispatch(), it always posts and returns noop_coroutine(). For strand::dispatch(), it may return h directly if already on the strand. The concern is specifically when dispatch() posts cross-thread — does the window between the post() and the return of noop_coroutine() allow the posted thread to destroy the frame?

4. if constexpr branch for non-coroutine_handle<> returns

The workaround correctly falls through to normal return for void and bool return types, which don't trigger the MSVC bug. This is sound.

Questions to investigate

  1. Are the when_all_runner / when_any_runner final_suspend awaiters vulnerable? They destroy their own frame via h.destroy() and then return a coroutine_handle<>. If MSVC writes the return value to the destroyed frame, this is UB regardless of threading.

  2. Is dispatch_trampoline::final_suspend vulnerable? It calls dispatch() which may cross-thread-post before returning.

  3. What is the practical stack depth limit? For direct task<T> chaining through transform_awaiter, how deep can a chain go before stack overflow on MSVC?

  4. Should the workaround be generalized? A utility like msvc_symmetric_transfer(handle) could be applied uniformly to all await_suspend sites returning coroutine_handle<>.

  5. Is there an MSVC version where this is fixed? If so, the #ifdef could be narrowed to _MSC_VER < NNNN.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions