Mircea Baja - 15 May 2025 # Intrusive heap
min_node
size
parent
left
right
data
parent
left
right
data
parent
left
right
data
parent
left
right
data
--- # Intrusive queue
head
tail
next
data
next
data
next
data
next
data
- intrusive - non-ownning - single linked - linear: end pointer is `nullptr` - header points to head and tail - no dummy node - no cached size --- # Intrusive queue - why? - queue of work that is ready to be run: "ready queue" - data contains the function of work to be done - e.g. resume coroutine that was previously suspendended - nodes are stored on the coroutine frames - they are pushed to the end of queue when work is ready - thread loops pop-ing from the start of queue - then runs work --- # Implementation ```cpp template
class intrusive_queue { Node* head_{ nullptr }; Node* tail_{ nullptr }; public: intrusive_queue() noexcept = default; ``` --- # Implementation ```cpp intrusive_queue(const intrusive_queue&) = delete; intrusive_queue& operator=(const intrusive_queue&) = delete; intrusive_queue(intrusive_queue&& other) noexcept : head_{ other.head_ }, tail_{ other.tail_ } { other.head_ = nullptr; other.tail_ = nullptr; } intrusive_queue& operator=(intrusive_queue && other) noexcept { ``` --- # Implementation ```cpp bool empty() const noexcept; void push(Node* what) noexcept Node* pop() noexcept; ``` --- # Implementation ```cpp void push(Node* what) noexcept { what->*next = nullptr; // linear list, end is nullptr if (nullptr == tail_) { head_ = what; } else { tail_->*next = what; } tail_ = what; } ``` --- # Intrusive list
head
tail
next
prev
data
next
prev
data
next
prev
data
next
prev
data
- intrusive - non-ownning - double linked - linear - end iterator is nullptr - header points to head and tail - no dummy node - no cached size --- # Intrusive list - why? - queue of work waiting for an event: "wait list" - data contains the function of work to be done when event is set - e.g. resume coroutine that was previously suspended - nodes are stored on the coroutine frames - they are pushed to the end of wait list to wait for the event - when event is set process from the "wait list" - move work to "ready queue" - if coroutine is cancelled remove work from "wait list" - same idea used for `stop_callback`s (i.e. work to do to cancel operations if a stop is required) --- # Implementation ```cpp template
class intrusive_list { Node* head_{ nullptr }; Node* tail_{ nullptr }; public: intrusive_list() noexcept = default; ``` --- # Implementation ```cpp bool empty() const noexcept; void push_back(Node* what) noexcept; void remove(Node* what) noexcept; Node* front() noexcept; Node* back() noexcept; Node* pop_front() noexcept; ``` --- # What's a min heap? - a binary tree data structure - the min node is at the root - partially sorted - a node is smaller than it's left and right ("the heap property") - the tree is complete - all levels are full, except the last (but even that one is filled left to right) - often stored in an array where node pointers are not needed - for our intrusive heap will use pointers to parent, left and right --- # Intrusive heap
min_node
size
parent
left
right
data
parent
left
right
data
parent
left
right
data
parent
left
right
data
--- # Intrusive heap - insert
min_node
size
parent
left
right
data
0000 0001
parent
left
right
data
0000 0010
parent
left
right
data
0000 0011
parent
left
right
data
0000 0100
parent
left
right
data
0000 0101
--- # Intrusive heap - why? - to implement e.g. `async_sleep_for(10ms)` - with the correct time complexity - and the correct noexcept semantics - convert duration to absolute time using a stable monotonic clock - use a min heap (earliest clock is at the top of the heap tree) - create node on coroutine frame - node data is: - absolute time used for comparisons - and work to be done when the time is reached - insert a new node in O(lg(N)) (e.g. a new coroutine waiting for a timer) - get the smallest node in O(1) (e.g. the next duration we need to wait for) - we want O(1) here because waiting for a timer might be interrupted by "some other work is ready", so searching for the min node might be done multiple times - pop the smallest node in O(lg(N)) (e.g. after its duration has elapsed) - remove a node in O(lg(N)) (e.g. when the timer is cancelled) - all the operations are `noexcept` --- # Implementation ```cpp template
class intrusive_heap { Node* min_node_{ nullptr }; std::size_t size_{ 0 }; public: intrusive_heap() noexcept = default; ``` --- # Implementation ```cpp bool empty() const noexcept; std::size_t size() const noexcept; Node* min_node() noexcept; const Node* min_node() const noexcept; void insert(Node* new_node) noexcept; void remove(Node* node) noexcept; void pop_min() noexcept; ``` --- ```cpp ++size_; // Assume new node is added Node* new_node_parent = min_node_; // size_ now is at least 2 // e.g. given if the size_ is now 5 (0...0101 in binary) // - the position of most significant 1 bit is // returned by bit_width (3 in the example case) // and indicates the depth/level of the new node // - then the following bits indicate the route to the node // with a 0 indicating left edge and a 1 indicating right edge // - we use a 1 bit mask against the size to test those bits, // starting with a bit mask 0...0010 in binary for this example. // i.e. a 1 shifted 1 position (i.e. 3 - 2, which we can do as // bit_width is at least 2) // - we first stop short of the last bit (mask 0...0001) to reach // the parent (in this case we just do one left), then we use // this last bit to insert to left or right of parent (right in // this case) std::size_t mask = ((std::size_t)1 << (std::bit_width(size_) - 2)); while (mask != 1) { if (size_ & mask) { new_node_parent = new_node_parent->*right; } else { new_node_parent = new_node_parent->*left; } mask >>= 1; } ``` --- # Intrusive heap: node vs. array - normally heaps use arrays structures e.g. a vector - then the three pointers are not required - index arithmetic is used to calculated parent and child position - problem on removing entries when timers are cancelled - a node based heap has the right time complexity - but at the cost of large constants involved - node swaps touches lots of pointers --- # Intrusive heap vs. red-black tree - neither has currently ready implementations in the C++ standard library - the algorithm for the heap is simpler once the `bid_width` trick is understood - for a red-black tree then - implementation options for the red-black tree would e.g. be to follow Sedgewick's left leaning approach - should a parent pointer be used in the node? Sedgewick's algorithm does not: remove node requires search first - min-node pointer (to the left-most node) needs to be maintained to give O(1) access time to it - red-black tree is sorted, while the heap is only partially sorted - there's got to be an advantage for the heap from not having to maintain a more strict property - hard to make performance guesses short of profiling and deeper study --- # Thread safety - none of the above is thread safe - at least a `std::mutex` is required to wrap access - there is a zoo of options available wait-free/lock-free etc. - in particular for queues of "ready to do work" that are processed by multiple threads you might want to look at least at not using a single `std::mutex`, but break the queue into multiple smaller queues (one per thread), each with it's own lock and the ability to "steal" work when a thread completes it's own work faster than other processing threads - see [Better Code: Concurrency - Sean Parent](https://www.youtube.com/watch?v=zULU6Hhp42w) --- # Questions?