# C++ coroutine mechanics
stack
1. return value = 10.
2. arguments
3. return address
4. function local variables
5. coroutine frame
9. state machine*
7. promise
6. arguments
8. coroutine body local variables
Mircea Baja - 11 May 2025 --- # C++ functions reminder
stack
1. return value
2. arguments
3. return address
4. function local variables
- optimisations are on top of this model e.g.: - return value and arguments might be passed via registers (calling conventions) - function body might be inlined --- # C++ coroutines - introduced in C++20 - spec has a large number of types and rules, most different from the others, ranging - from new keywords and compiler implementation - user implemented, but with language defined hooks for customization - to "nice to have"s, but could be entirely implemented by user - NOTE: user(s) can further be divided into library writers and users of the library (will conflate both together here) - reading list - [spec N4775](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/n4775.pdf): relatively short, readable, but background info required - [Lewis Baker's posts](https://lewissbaker.github.io/): long, not up to date, but really good background info required to understand coroutines --- # Coroutine syntax ```cpp task foo() { auto x = co_await bar(); } generator buzz() { co_yield 42; } task wozz() { co_return 42; } ``` - looks like a function (the signature is the ramp function signature) - three new keywords: - `co_await` used to define a (potential) suspension point - `co_yield` to (potentially) suspend and yield some value - `co_return` to return a value at the end, no just `return` in the coroutine body - other than that, normal code inside the coroutine body - can be member, template, but not destructor, not in `catch` blocks --- # Visual layout
- stackless, decomposed into coroutine frame and function(s), first call is special: ramp --- # Coroutine implemented as functions - mathematical equivalence of functions and coroutines (duality) - functions are NOT implemented using coroutines - function mechanics uses hardware support such as assembly instruction `CALL` which "knows" to save the current address on the stack, increment the stack pointer register and continue from the location provided - coroutines are implemented using functions - reason is to exploit this function hardware efficiency - the first of these functions is special: ramp - the following resume and eventually destroy from a previous suspension - coroutine frame is allocated on the heap (unless customized or optimized out) - keeps data between suspension points --- # std::coroutine_handle - C++ class wrapper around a pointer to the coroutine frame - allows control of the coroutine frame via members of the class - `void resume()` - `void destroy()` - `bool done()` - ... - handle, not RAII: - e.g. it has copy, not just move; copy copies the pointer - destroy() does not get called automatically on the handle destruction - also it's a template (in an unusual way for handles) - it has to be used with care: don't resume if destroyed, don't resume if not suspended, don't destroy more than once, etc. --- # Ramp start
stack
1. return value = 10.
2. arguments
3. return address
4. function local variables
5. coroutine frame
9. state machine*
7. promise
6. arguments
8. coroutine body local variables
--- # Ramp start ```cpp // given task
foo(int x) { int y = co_await bar(); co_return std::to_string(x + y); } // ramp gets called somewhere task
t = foo(42); ``` 1. ramp function return value is `t` of type `task
` 2. argument is `42` of type `int` 3. return address is calle's return address 4. stack used by the ramp (e.g. for call to malloc to allocate coroutine frame) 5. coroutine frame is allocated 6. `42` of type `int` is copied to the frame (so that it can be accessed later) 7. promise what? - this promise has practically nothing to do with `std::promise` --- # std::coroutine_traits ```cpp template
struct coroutine_traits {}; template
requires requires { typename R::promise_type; } struct coroutine_traits
{ using promise_type = typename R::promise_type; }; ``` - this customization point allows using coroutine declared return types that do not need to be change to expose a `promise_type` (e.g. use with `std::future` instead of our `task
`) OR to consider coroutine function arguments too in the resolution to the `promise_type` - rarely cusomized in practice, usually: ```cpp using coroutine_traits
, int>::promise_type = task
::promise_type; ``` --- # Promise type ```cpp class task { public: class promise_type { }; }; // or class task { public: using promise_type = some_other_type; }; ``` - two options: the `promise_type` is defined within the declared return type or aliased to some other type defined elsewhere --- # Promise construction ```cpp promise_type promise(<
>); ``` - the compiler looks for a constructor that matches the arguments provided to the coroutine (`int` in our example) - else (typical) falls back to a default constructor The promise allows all sorts of coroutine customization --- # Re-write --- # Body re-write ```cpp promise_type promise(<
>); try { co_await promise.initial_suspend(); // ... body here } catch (...) { if (!made enough progress through initial_suspend) { throw; } promise.undhandled_exception(); } co_await promise.final_suspend(); ``` - above pseudocode is an over-simplification - it allows coroutine behaviour customization: - space on the frame to handle the values from `co_return` and/or `co_yield` - should it suspend at the start? - should it suspend at the end? - what should it do if exceptions are thrown? --- # co_await expression re-write Of the three `co_...` keywords `co_await` is more special. For an expression in the coroutine body (e.g. not initial_suspend or final_suspend): ```cpp auto x = co_await expression; ``` Start with the **expression**: - if the promise has a `await_transform`: apply that to the expression to get an awaitable - else: continue with the expression as the awaitable Then with the **awaitable** - if the awaitable has a member `operator co_await`: call that to get an awaiter - else if there is some non-member `operator co_await` taking the awaitable: call that to get an awaiter - else: continue with the awaitable as the awaiter Then expect that the **awaiter** has three specific `await_...` methods. --- # co_await awaiter re-write ```cpp auto x = co_await awaiter; ``` is roughly equivalent to ```cpp if (!awaiter.await_ready()) { // ... suspend coroutine (save data) here awaiter.await_suspend(coroutine_handle) // 3 variations of // ... return to caller here // ... // ... when resumed continue from here } auto x = awaiter.await_resume(); ``` - this is still pseudocode as it involves jumps e.g. to resume where somehow object continue as if they were in scope --- # Suspend point - the call to `await_suspend` is after the coroutine data is saved (suspended) - e.g. it needs to save to the coroutine frame "next time continue from step N" - this helps with multithreading as await suspend might call some API the completion of which will resume concurrently with await_suspend completing - e.g. that other thread can resume from step N, does not need to add synchronization to wait for await_suspend to complete - but is not sufficient, in multithreaded scenarios additional care and synchronization might be required - the (potentially temporary) awaiter is stored on the coroutine frame while the coroutine is suspended - because all data that spans a suspension point needs to be stored in the coroutine frame (8) to be available the next time the coroutine is resumed - await_ready allows optimisations to skip this "save" work --- # std::suspend_always ```cpp struct suspend_always { constexpr bool await_ready() const noexcept { return false; } constexpr void await_suspend(coroutine_handle<>) const noexcept {} constexpr void await_resume() const noexcept {} }; ``` - this is an example of "nice to have", e.g. often returned from initial_suspend to have a lazy coroutine that suspends before the body is executed e.g. ```cpp struct promise_type { // ... std::suspend_always initial_suspend() noexcept { return {}; } }; ``` --- # std::suspend_never ```cpp struct suspend_always { constexpr bool await_ready() const noexcept { return true; } constexpr void await_suspend(coroutine_handle<>) const noexcept {} constexpr void await_resume() const noexcept {} }; ``` --- # Promise-based customizations --- # Ramp return object ```cpp auto return_object = promise.get_return_object(); ``` e.g. implemented as: ```cpp struct promise_type { // ... task get_return_object() noexcept { return { std::coroutine_handle
::from_promise(*this) }; } }; ``` - this is the return value from the ramp function (spaced reserved on the stack at point 1, stored on the stack - point 10) - basically returned the first time the coroutine suspends - quite common is to suspend in `initial_suspend` - conversion between `promise` and `coroutine_handle` implies the promise is stored at a constant offset from the coroutine frame address --- # Additional suspend points ```cpp co_await promise.initial_suspend(); // ... co_await promise.final_suspend(); ``` e.g. ```cpp struct promise_type { // ... std::suspend_always initial_suspend() noexcept { return {}; } }; ``` --- # Exceptions ```cpp catch(...) { promise.unhandled_exception(); } // ... co_await promise.final_suspend(); ``` e.g. ```cpp struct promise_type { // ... void unhandled_exception() noexcept { exception_ptr_ = std::current_exception(); } }; ``` --- # co_return expression ```cpp co_return expression; ``` is re-writen as: ```cpp promise.return_value(expression); ``` - type of `co_return` is not the same as the functions declared return type - the declared return type (e.g. `task
` allows access to the values passed to `co_return`, via the promise, therefore it's commonly a template class that takes the type of `co_return` as a template parameter - can choose to not implement `promise.return_value` to limit how that kind of coroutine can be used (via compilation errors) --- # co_return expression ```cpp task
wozz() { co_return 42; } ``` compiler error for the `wozz` if the `task
::promise_type.return_value(int)` does not compile/exist, to compile you need at least something like: ```cpp struct promise_type { // ... void return_value(int x) noexcept { value_ = x; } }; ``` - options on how to capture the argument (by value, by const reference, by rvalue reference etc.) --- # co_return void ```cpp co_return; ``` is re-writen as: ```cpp promise.return_void(); ``` - can choose to not implement `promise.return_void` to limit how that kind of coroutine can be used (via compilation errors) e.g. implement just `promise.return_value` to only allow `co_return expression;` --- # co_return void ```cpp task
foo() { co_return; } task
buzz() { co_await bar(); } ``` - `co_return;` can be implicit if the function reaches it's end (as in `buzz`) ```cpp struct promise_type { // ... void return_void() noexcept { } }; ``` you need to implement at least one of `return_value` or `return_void` --- # co_yield ```cpp co_yield expression; ``` is re-writen as: ```cpp co_await promise.yield_value(expr); ``` - can choose to not implement `promise.yield_value` to limit how that kind of coroutine can be used (via compilation errors) --- # co_yield ```cpp generator
foo() { co_yield 42; } ``` for that to compile you need something like ```cpp struct promise_type { // ... std::suspend_always yield_value(int x) noexcept { value_ = x; } }; ``` - and then the `generator` has to get access to `value_` somehow --- # Allocation control - custom operator new and delete - get_return_object_on_allocation_failure() # await_transform - e.g. can be used to disallow `co_await` --- # The three versions of await_suspend --- # void if await_suspend returns `void`: ```cpp if (!awaiter.await_ready()) { // ... suspend coroutine (save data) here awaiter.await_suspend(coroutine_handle) // ... return to caller here // ... // ... when resumed continue from here } auto x = awaiter.await_resume(); ``` --- # bool if await_suspend returns `bool`: ```cpp if (!awaiter.await_ready()) { // ... suspend coroutine (save data) here if (awaiter.await_suspend(coroutine_handle)) { // ... return to caller here // ... // ... when resumed continue from here } } auto x = awaiter.await_resume(); ``` Why? - sometimes the result is already available and can resume immediately without returning to the caller --- # std::coroutine_handle if await_suspend returns `coroutine_handle`: ```cpp if (!awaiter.await_ready()) { // ... suspend coroutine (save data) here auto h = awaiter.await_suspend(coroutine_handle)) { h.resume(); // ... return to caller here // ... // ... when resumed continue from here } } auto x = awaiter.await_resume(); ``` Why? - allows a coroutine to resume another specified one before returning to the caller - it's complicated and has to do with avoiding stack overflows Also `std::noop_coroutine()` gives us a handle to use to not really resume another coroutine. --- # The stack overflow scenario ```cpp task completes_synchronously() { co_return; } task loop_synchronously(int count) { for (int i = 0; i < count; ++i) { co_await completes_synchronously(); } } ``` see [Lewis Baker: C++ Coroutines: Understanding Symmetric Transfer](https://lewissbaker.github.io/2020/05/11/understanding_symmetric_transfer) --- # How coroutine state works --- # How it could work In many ways but: - it needs to handle two cases: - resume - destroy - it needs to keep the promise at a constant offset Concretely 1. Could store two function pointers in the state part of the coro, when the state is saved update those functions to be the next step functions 2. Could store two fixed function pointers, store an index ("we got to step N"), when the functions are called switch/jump to the work for step N 3. Other creative ways Practically it's a variation of option 2. --- # Tail recursion The pseudocode for symmetric transfer ```cpp if (!awaiter.await_ready()) { // ... suspend coroutine (save data) here auto h = awaiter.await_suspend(coroutine_handle)) { h.resume(); // ... return to caller here // ... // ... when resumed continue from here } } auto x = awaiter.await_resume(); ``` The pseudocode above does not capture that really the compiler implements the state machine of the coroutine using glorious `goto`/jump statements and the h.resume() is positioned to the end (before returning to the called) allowing for tail recursion optimisation to avoid stack overflow. --- # Questions?