Mircea Baja @ ACCU Oxford - 26 Feb 2020 # Irregularity in Generic Programming - What is irregularity? - The irregularity continuum - The irregularity conjecture - Hyperbolic usage conjecture - The naming quirks: why naming is hard - What is regularity? - Elements of Programming - Concepts as thinking tool - Concepts in C++ 20 - A list/vector is just a kind of list/vector ??? The original idea of this talk is that diversity in data structures and algorithms results in generic solutions being quirky and not universal (the irregularity conjecture). To get there, we're going to have a bit of fun with simple data structures (linked lists) and simple algorithms (linear find and partition), see what (C++20) concepts are, to build towards the more philosophical part on the design and usage of generic code. --- class: large-points # The vector kind theorem - Theorem: the vector is just a kind of vector (paraphrasing Alex Stepanov) - Lemma: a linked list is just a kind of linked list --- # Single linked ![Image](../assets/2018-06-23-linked-lists-options/01-single-linked.png) # Double linked ![Image](../assets/2018-06-23-linked-lists-options/02-double-linked.png) ??? Use vectors, i.e. contiguous, extent based data structures --- # Linear ![Image](../assets/2018-06-23-linked-lists-options/03-linear.png) # Circular ![Image](../assets/2018-06-23-linked-lists-options/04-circular.png) --- # Header - minimalistic - Pointer to head ![Image](../assets/2018-06-23-linked-lists-options/05-header-head.png) - Pointer to tail ![Image](../assets/2018-06-23-linked-lists-options/06-header-tail.png) - or Not minimalistic --- # Links to local parts - No ![Image](../assets/2018-06-23-linked-lists-options/07-no-links-to-local.png) - Yes ![Image](../assets/2018-06-23-linked-lists-options/08-links-to-local.png) --- # Dummy node ![Image](../assets/2018-06-23-linked-lists-options/09-dummy-node.png) - Meaning (end iterator, none/simplify empty list case) - Location (heap, header) - If on the heap: always present? - Can dereference/does it have a value? --- class: large-points # More - Iterators - minimalistic - List size - Operations available - e.g. constant time `push_back()` - Splicing - partial/total - Iterator from reference to value - Permanent end iterator - Forward/Bidirectional iterator - Intrusive/non-intrusive - Node ownership - Allocators - Meaning of node pointers/fast reverse - Thread safety --- # Single linked basic ![Image](../assets/2018-06-28-linked-lists-examples/01-single-basic.png) - `std::forward_list` - No constant time `push_back()` - Can be used to implement a typical stack/FILO, but not a typical queue/FIFO - Note: comparing for end slightly different from comparing two iterators --- # Single linked circular ![Image](../assets/2018-06-28-linked-lists-examples/02-single-circular.png) - Two choices on accessing tail from the iterator: - Directly: as above - Indirectly: through the header --- # Single linked first-last ![Image](../assets/2018-06-28-linked-lists-examples/03-single-first-last.png) --- # Double linked linear ![Image](../assets/2018-06-28-linked-lists-examples/04-double-linear.png) --- # Double linked circular ![Image](../assets/2018-06-28-linked-lists-examples/05-double-circular.png) --- # Double linked with allocated dummy node ![Image](../assets/2018-06-28-linked-lists-examples/06-double-dummy-node.png) - `std::list` in Visual C++ 2017/2019 --- # Allocated dummy node ![Image](../assets/2018-06-28-linked-lists-examples/07-double-dummy-node.png) -- ```cpp static_assert(std::is_nothrow_default_constructible_v
>, "Default constructor may throw"); static_assert(std::is_nothrow_move_constructible_v
>, "Move constructor may throw"); // both assert in Visual C++ 2017/2019 ``` --- # Double linked with dummy node in header ![Image](../assets/2018-06-28-linked-lists-examples/08-double-dummy-in-header.png) - `move` invalidates end - another `std::list` implementation option gcc/clang --- class: large-points # Linked list summary - Despite ignoring many additional choices: there are many kinds of linked list - There are more differences than things they all have together - They all have a use case - E.g. single linked lists: the price to pay for constant time `push_back()` either: iterator not minimalistic OR header not minimalistic - It is not a closed system: I doubt we can say confidently that there is no other linked list variation --- # Implementing one ```cpp template
class dl_list { struct node; struct links { node * next_; node * prev_; }; struct node : public links { T value_; //... ``` - A double linked list - Circular - With dummy node in the header - Non-intrusive (nodes are provided by the list class) - Nodes are allocated on the heap (no custom allocator) - List owns the nodes - No cached size --- class: large-points # Type naming quirks - A lot of info on implementation details is required to provide a complete type description - That information has to be encoded somewhere - Encoding in the name is not scalable - Encoding as type parameters is not scalable - Dealing with all the details leads to cognitive load - We're equipped with a large common vocabulary for the physical world (only) - A pragmatic solution: reduced vocabulary - `std::list` and `std::forward_list` - `std::shared_ptr` and `std::unique_ptr` - Open vs. closed type systems --- class: large-points # Hyperbolic usage conjecture - A small number of built-in types used a lot - A large number of user defined types used once or twice ![Image](../assets/2020-02-26-irregularity/08-type-usage.png) --- # Iterators: generalized pointers - "points to"/hardware view ![Image](../assets/2020-02-26-irregularity/05-iterator-hardware.png) - "points between"/Sean Parent view ![Image](../assets/2020-02-26-irregularity/06-iterator-sp.png) --- # Linear find with pointers - toy ```cpp int data[] = {2, 5, 7, 1, 22}; int * it = find(std::begin(data), std::end(data), 5); if (it == std::end(data)) { // not found ... } else { // found ... *it = 42; } ``` ```cpp int * find(int * f, int * l, int x) { while ((f != l) && (*f != x)) ++f; return f; } ``` - Note we compare f with l twice: inside the `find` and outside - In this case once with `!=`, the other time with `==` - In well written code the safety issues do not propagate forever --- # Linear find canonical - toy ```cpp template
It find(It f, It l, const T & x) { while ((f != l) && (*f != x)) ++f; return f; } ``` - When does it work? - It will work for `T` being `int`, `short`, `long`, `unsigned` etc. - `It` is pointer-like `first` and `last` to sequences of integer-like values (e.g. `T *`) - `It` can also be an iterator in a linked list - Integer-like not in the addition/multiplication sense, but in the it holds a value that can be copied, compared for equality etc. --- class: large-points # Requirements syntactic - We can compare for equality `f` and `l` - We can dereference `f` - The iterator type has an associated value type - Dereferencing `f` is reference to the value type - We can advance `f` (with `++`) - These are easy to express --- class: large-points # Requirements semantic - Once `f` equals `l`, it stays that way, we can compare again and get the same result - And that's true even for a copy of `f` - Equality: reflexive, symmetric and transitive - We can reach `l` from `f`, `l` should not be dereferenced - These are harder to express --- # Linear find using concepts - toy ```cpp template
using ValueType = typename It::value_type; template
concept InputIterator = requires(It a, It b) { {a == b} -> std::boolean; {a != b} -> std::boolean; typename ValueType
; {*a} -> std::common_reference_with
>; ++a; }; template
requires InputIterator
It find(It f, It l, const ValueType
& x) { while ((f != l) && (*f != x)) ++f; return f; } ``` - Still toy example --- # Linear find - industrial version ```cpp template
S, class T, class Proj = ranges::identity > requires IndirectRelation
, projected
, const T*> I find(I f, S l, const T& x, Proj proj = Proj{}) { while ((f != l) && (ranges::invoke(proj, *f) != x)) ++f; return f; } ``` - Sentinel supports iterators stopping on file end, zero terminated strings. In general the case where checking for the end is different from equality of two iterators. - Projections support the case where only part of the value type is used - Type `T` is not necessarily the value type of the iterator - Overloaded to get a `range` e.g. the whole container ```cpp std::cout << std::find(people, "Alice", &person::first_name)->last_name; ``` --- # Faster linear find ```cpp template
// requires I is an ForwardIterator // T is equality comparable, // T is the same as ValueType(I) I find_by_setting_sentinel(I f, I l, const T & v) { // precondition: l can be dereferenced to store value as sentinel *l = v; while (*f != v) { ++f; } return f; } ``` - Trick known in 1974/Knuth --- # Partition - example ![Image](../assets/2020-02-26-irregularity/07-partition.png) --- class: large-points # Partition - options - Forward iterators - semistable - Bidirectional iterators - faster - Stable - more resources - But can be implemented stable for list - the list iterators have something in common after all - The predicate is only applied once --- class: large-points # Concepts - Specific: a particular way of implementing concepts in C++20 describing syntactic requirements (constraints) - General: tool of thinking about requirements on types (not on values), especially relations between types. - They have a open/extensible character: not all concepts are orthogonal or hierarchical: the case of overlapping requirements identified after the type definition --- class: large-points # Concept naming quirks - Non-intuitive synthetic names e.g. iterator, value type, sentinel, projection etc. - Not derived mechanically: no mechanic nested concept propagation - Based on observations on common behaviour - They are invented, taste and choice matters - Reduced vocabulary is a practical choice sometimes - Another choice is customisation points e.g. `std::begin` - Another practical choice is over-constraining to start with - Hyperbolic usage conjecture applies to concepts - Don't try to be too generic on custom/rarely used concepts --- class: large-points # Regular concept - for data - Default constructible - Copyable (and movable) - Equality - Destructible ```cpp Some a; a = b; assert(a == b); c = b; assert(a == c); ``` - Fundamental behaviour, relied on for correctness reasoning - Integer-like not in the addition/multiplication sense, but in the it holds a value that can be copied, compared for equality etc. - e.g. standard containers --- class: large-points # Regular concept - for functions - Arguments and return types are regular data types - Replacing arguments with equal values results into equal return values - Pure functions with no side effects, and no internal state like random number generators - Can have side effects, but consistent results - e.g. predicates for standard algorithms - Different sounding definition compared with data regularity - Same outcome: fundamental behaviour, relied on for correctness reasoning --- class: large-points # Irregularity continuum - For almost every characteristic of a regular data concept, there are useful types that violate it --- class: large-points # Data irregularity examples - Move: a bit weird - move an integer? - Copy constructor throws - standard containers - Default constructor throws - `std::list` - Move constructor throws - `std::list` - Destructor throws - scope exit idiom - No default constructor - scope object type - No copy - move only resource types - Equality - of what? e.g. equality of remote content - Equality complexity - worst case quadratic for unordered containers - Equality for input iterator - does not make sense - Float Nan equality `==` not `!=` - **But** have not seen a sane case where `==` is different from not `!=` --- class: large-points # Function irregularity examples - A lot of code is about side effects, order of operations matter - fully irregular - Pseudo-regular - algorithms where the function only gets applied once per value - e.g. partition example --- class: large-points # Irregularity conjecture - Generic data structures and algorithms are diverse, quirky and are not universal -- - Reduced vocabulary is a pragmatic option of handling complexity - Gains (efficiency/different behaviour) are available in special cases - Either by using a more refined vocabulary - Or by using human knowledge not available to the machine --- class: large-points # Things we know - examples - Side effects are fine: pseudopredicate - Can open file: running as root/administrator - Sequence is sorted/ordered/partitioned for relation used: binary search can be used - Relation is transitive for values provided - `push_back()` won't throw: `vector` was previously resized - Permutations can avoid invalidating the moved from objects - Machines cover more ground, but there is always a gap --- # Books
--- # References Alexander A. Stepanov and Paul McJones: Elements of Programming Robert Endre Tarjan: Data Structures and Network Algorithms Andrew Sutton and Bjarne Stroustrup: Design of Concept Libraries for C++ [SLE2011](http://www.stroustrup.com/sle2011-concepts.pdf) # Links https://bajamircea.github.io/coding/cpp/2018/08/01/linear-find.html https://bajamircea.github.io/coding/cpp/2018/08/05/partition.html https://bajamircea.github.io/coding/cpp/2018/06/28/linked-lists-examples.html https://bajamircea.github.io/coding/cpp/2018/06/30/lists-implementing.html https://bajamircea.github.io/presentations/2020-02-26-irregularity.html --- # Thanks Thanks go to Nigel Lester for organizing the ACCU meeting --- # Questions?