+ - 0:00:00
Notes for current slide

The original idea of this talk is that diversity in data structures and algorithms results in generic solutions being quirky and not universal (the irregularity conjecture).

To get there, we're going to have a bit of fun with simple data structures (linked lists) and simple algorithms (linear find and partition), see what (C++20) concepts are, to build towards the more philosophical part on the design and usage of generic code.

Notes for next slide

Mircea Baja @ ACCU Oxford - 26 Feb 2020

Irregularity in Generic Programming

  • What is irregularity?
  • The irregularity continuum
  • The irregularity conjecture
  • Hyperbolic usage conjecture
  • The naming quirks: why naming is hard
  • What is regularity?
  • Elements of Programming
  • Concepts as thinking tool
  • Concepts in C++ 20
  • A list/vector is just a kind of list/vector
1

The original idea of this talk is that diversity in data structures and algorithms results in generic solutions being quirky and not universal (the irregularity conjecture).

To get there, we're going to have a bit of fun with simple data structures (linked lists) and simple algorithms (linear find and partition), see what (C++20) concepts are, to build towards the more philosophical part on the design and usage of generic code.

The vector kind theorem

  • Theorem: the vector is just a kind of vector (paraphrasing Alex Stepanov)

  • Lemma: a linked list is just a kind of linked list

2

Single linked

Image

Double linked

Image

3

Use vectors, i.e. contiguous, extent based data structures

Linear

Image

Circular

Image

4

Header - minimalistic

  • Pointer to head

Image

  • Pointer to tail

Image

  • or Not minimalistic
5

Links to local parts

  • No

Image

  • Yes

Image

6

Dummy node

Image

  • Meaning (end iterator, none/simplify empty list case)
  • Location (heap, header)
  • If on the heap: always present?
  • Can dereference/does it have a value?
7

More

  • Iterators - minimalistic
  • List size
  • Operations available - e.g. constant time push_back()
  • Splicing - partial/total
  • Iterator from reference to value
  • Permanent end iterator
  • Forward/Bidirectional iterator
  • Intrusive/non-intrusive
  • Node ownership
  • Allocators
  • Meaning of node pointers/fast reverse
  • Thread safety
8

Single linked basic

Image

  • std::forward_list
  • No constant time push_back()
  • Can be used to implement a typical stack/FILO, but not a typical queue/FIFO
  • Note: comparing for end slightly different from comparing two iterators
9

Single linked circular

Image

  • Two choices on accessing tail from the iterator:
    • Directly: as above
    • Indirectly: through the header
10

Single linked first-last

Image

11

Double linked linear

Image

12

Double linked circular

Image

13

Double linked with allocated dummy node

Image

  • std::list in Visual C++ 2017/2019
14

Allocated dummy node

Image

15

Allocated dummy node

Image

static_assert(std::is_nothrow_default_constructible_v<std::list<int>>,
"Default constructor may throw");
static_assert(std::is_nothrow_move_constructible_v<std::list<int>>,
"Move constructor may throw");
// both assert in Visual C++ 2017/2019
16

Double linked with dummy node in header

Image

  • move invalidates end
  • another std::list implementation option gcc/clang
17

Linked list summary

  • Despite ignoring many additional choices: there are many kinds of linked list
  • There are more differences than things they all have together
  • They all have a use case
  • E.g. single linked lists: the price to pay for constant time push_back() either: iterator not minimalistic OR header not minimalistic
  • It is not a closed system: I doubt we can say confidently that there is no other linked list variation
18

Implementing one

template<typename T>
class dl_list {
struct node;
struct links {
node * next_;
node * prev_;
};
struct node : public links {
T value_;
//...
  • A double linked list
  • Circular
  • With dummy node in the header
  • Non-intrusive (nodes are provided by the list class)
  • Nodes are allocated on the heap (no custom allocator)
  • List owns the nodes
  • No cached size
19

Type naming quirks

  • A lot of info on implementation details is required to provide a complete type description
  • That information has to be encoded somewhere
  • Encoding in the name is not scalable
  • Encoding as type parameters is not scalable
  • Dealing with all the details leads to cognitive load
  • We're equipped with a large common vocabulary for the physical world (only)
  • A pragmatic solution: reduced vocabulary
  • std::list and std::forward_list
  • std::shared_ptr and std::unique_ptr
  • Open vs. closed type systems
20

Hyperbolic usage conjecture

  • A small number of built-in types used a lot
  • A large number of user defined types used once or twice

Image

21

Iterators: generalized pointers

  • "points to"/hardware view

Image

  • "points between"/Sean Parent view

Image

22

Linear find with pointers - toy

int data[] = {2, 5, 7, 1, 22};
int * it = find(std::begin(data), std::end(data), 5);
if (it == std::end(data)) {
// not found ...
}
else {
// found ...
*it = 42;
}
int * find(int * f, int * l, int x) {
while ((f != l) && (*f != x)) ++f;
return f;
}
  • Note we compare f with l twice: inside the find and outside
  • In this case once with !=, the other time with ==
  • In well written code the safety issues do not propagate forever
23

Linear find canonical - toy

template<typename It, typename T>
It find(It f, It l, const T & x) {
while ((f != l) && (*f != x)) ++f;
return f;
}
  • When does it work?
  • It will work for T being int, short, long, unsigned etc.
  • It is pointer-like first and last to sequences of integer-like values (e.g. T *)
  • It can also be an iterator in a linked list
  • Integer-like not in the addition/multiplication sense, but in the it holds a value that can be copied, compared for equality etc.
24

Requirements syntactic

  • We can compare for equality f and l
  • We can dereference f
  • The iterator type has an associated value type
  • Dereferencing f is reference to the value type
  • We can advance f (with ++)
  • These are easy to express
25

Requirements semantic

  • Once f equals l, it stays that way, we can compare again and get the same result
  • And that's true even for a copy of f
  • Equality: reflexive, symmetric and transitive
  • We can reach l from f, l should not be dereferenced
  • These are harder to express
26

Linear find using concepts - toy

template<typename It>
using ValueType = typename It::value_type;
template<typename It>
concept InputIterator = requires(It a, It b)
{
{a == b} -> std::boolean;
{a != b} -> std::boolean;
typename ValueType<It>;
{*a} -> std::common_reference_with<ValueType<It>>;
++a;
};
template<typename It>
requires InputIterator<It>
It find(It f, It l, const ValueType<It> & x)
{
while ((f != l) && (*f != x)) ++f;
return f;
}
  • Still toy example
27

Linear find - industrial version

template<InputIterator I, Sentinel<I> S, class T, class Proj = ranges::identity >
requires IndirectRelation<ranges::equal_to<>, projected<I, Proj>, const T*>
I find(I f, S l, const T& x, Proj proj = Proj{})
{
while ((f != l) && (ranges::invoke(proj, *f) != x)) ++f;
return f;
}
  • Sentinel supports iterators stopping on file end, zero terminated strings. In general the case where checking for the end is different from equality of two iterators.
  • Projections support the case where only part of the value type is used
  • Type T is not necessarily the value type of the iterator
  • Overloaded to get a range e.g. the whole container
std::cout << std::find(people, "Alice", &person::first_name)->last_name;
28

Faster linear find

template<typename I, typename T>
// requires I is an ForwardIterator
// T is equality comparable,
// T is the same as ValueType(I)
I find_by_setting_sentinel(I f, I l, const T & v) {
// precondition: l can be dereferenced to store value as sentinel
*l = v;
while (*f != v) {
++f;
}
return f;
}
  • Trick known in 1974/Knuth
29

Partition - example

Image

30

Partition - options

  • Forward iterators - semistable
  • Bidirectional iterators - faster
  • Stable - more resources
  • But can be implemented stable for list - the list iterators have something in common after all
  • The predicate is only applied once
31

Concepts

  • Specific: a particular way of implementing concepts in C++20 describing syntactic requirements (constraints)
  • General: tool of thinking about requirements on types (not on values), especially relations between types.
  • They have a open/extensible character: not all concepts are orthogonal or hierarchical: the case of overlapping requirements identified after the type definition
32

Concept naming quirks

  • Non-intuitive synthetic names e.g. iterator, value type, sentinel, projection etc.
  • Not derived mechanically: no mechanic nested concept propagation
  • Based on observations on common behaviour
  • They are invented, taste and choice matters
  • Reduced vocabulary is a practical choice sometimes
  • Another choice is customisation points e.g. std::begin
  • Another practical choice is over-constraining to start with
  • Hyperbolic usage conjecture applies to concepts
  • Don't try to be too generic on custom/rarely used concepts
33

Regular concept - for data

  • Default constructible
  • Copyable (and movable)
  • Equality
  • Destructible
Some a;
a = b;
assert(a == b);
c = b;
assert(a == c);
  • Fundamental behaviour, relied on for correctness reasoning
  • Integer-like not in the addition/multiplication sense, but in the it holds a value that can be copied, compared for equality etc.
  • e.g. standard containers
34

Regular concept - for functions

  • Arguments and return types are regular data types
  • Replacing arguments with equal values results into equal return values
  • Pure functions with no side effects, and no internal state like random number generators
  • Can have side effects, but consistent results
  • e.g. predicates for standard algorithms
  • Different sounding definition compared with data regularity
  • Same outcome: fundamental behaviour, relied on for correctness reasoning
35

Irregularity continuum

  • For almost every characteristic of a regular data concept, there are useful types that violate it
36

Data irregularity examples

  • Move: a bit weird - move an integer?
  • Copy constructor throws - standard containers
  • Default constructor throws - std::list
  • Move constructor throws - std::list
  • Destructor throws - scope exit idiom
  • No default constructor - scope object type
  • No copy - move only resource types
  • Equality - of what? e.g. equality of remote content
  • Equality complexity - worst case quadratic for unordered containers
  • Equality for input iterator - does not make sense
  • Float Nan equality == not !=
  • But have not seen a sane case where == is different from not !=
37

Function irregularity examples

  • A lot of code is about side effects, order of operations matter - fully irregular
  • Pseudo-regular - algorithms where the function only gets applied once per value - e.g. partition example
38

Irregularity conjecture

  • Generic data structures and algorithms are diverse, quirky and are not universal
39

Irregularity conjecture

  • Generic data structures and algorithms are diverse, quirky and are not universal
  • Reduced vocabulary is a pragmatic option of handling complexity
  • Gains (efficiency/different behaviour) are available in special cases
  • Either by using a more refined vocabulary
  • Or by using human knowledge not available to the machine
40

Things we know - examples

  • Side effects are fine: pseudopredicate
  • Can open file: running as root/administrator
  • Sequence is sorted/ordered/partitioned for relation used: binary search can be used
  • Relation is transitive for values provided
  • push_back() won't throw: vector was previously resized
  • Permutations can avoid invalidating the moved from objects
  • Machines cover more ground, but there is always a gap
41

Books

42

Thanks

Thanks go to Nigel Lester for organizing the ACCU meeting

44

Questions?

45

The vector kind theorem

  • Theorem: the vector is just a kind of vector (paraphrasing Alex Stepanov)

  • Lemma: a linked list is just a kind of linked list

2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow