The many relations
Let’s get better at understanding relations (and I don’t mean the inter-personal ones) and they translate from mathematics to programming (C++ in particular).
This article is part of a series on the history of regular data type in C++.
Why?
An example of a predicate is the situation when we do a < b
. We give two
values and we get a boolean that tells us if a
is less than b
or not. We
say that we deal with the “less than” predicate. Predicates are useful as
customization points in generic algorithms, such as when we want to sort
descending instead of ascending we can customize the sort algorithm with the
“greater than” predicate instead of the default “less than” predicate.
Mathematics roots
This terminology is borrowed from mathematics. There we start with sets e.g.
A
, B
, C
etc. Then we have sets of tuples e.g. the set {(a, b, c) | a in
A, b in B and c in C}
written as A x B x C
(the Cartesian product of the
sets, in particular the case of pairs, e.g. A x B
.
There is a duality between a subset of a Cartesian product and a function that
maps a tuple to two values that can be used to establish if the tuple is a
member of the subset. E.g. true
means that the tuple is member if the subset,
false
otherwise, or sometimes 1
and 0
are used instead. Such a subset or
the associated function is called a relation, often the associated function
returning a boolean (or similar) is called a predicate. This approach to
relations is a generalized and somewhat non-intuitive approach compared with
how we started, where a < b
seemed something between a
and b
, rather than
a subset or a kind of function.
When the associated function is defined for the whole Cartesian product the
relation is total, otherwise it is partial. The meaning of the word
domain of the relation is overloaded: sometimes it’s the whole Cartesian
product, sometimes just the subset for which the relation function is defined
(the domain of definition). For a partial relation sometimes you have a total
relation that indicates if the partial one is defined. E.g. “is divisible by”
does not make sense if the divisor is 0
, but you can have a total relation
“is in domain of divisible by” which checks if the divisor is 0
and returns
false
in that case (note that this check requires additional computation).
When the sets in the Cartesian product are the same e.g. A x A x A
the
relation is homogeneous. In this case the domain word is further
overloaded: it sometimes refers just to the set A
rather than the Cartesian
product. A relation that is not homogeneous is called heterogeneous.
The zoo of relations
This general framework leads to a whole zoo of relations. But some are more special than others. Unary and binary predicates occur more often than other predicates. Homogeneous predicates are more common in usage than heterogeneous.
In particular equality is special. It is related to copy: when you take a copy you expect it to be equal to the original. It is a special case of an equivalence relation. Equivalence relations are reflexive, symmetric and transitive. Equality is the strictest equivalence relation short of identity. Identity means “is the same object”, usually checked by testing the addresses of the objects involved for equality.
“Less than” is also special. It is related to equality via the trichotomy rule: given two values, one is less than the other or the values are equal. “Less than” is a special case of order relations: it is strict.
Order usually refers to any of the “less than”, “greater than”, “less than or equal”, “greater than or equal”. Comparison refers to order or equality relations.
Sometimes terminology strong and weak is used e.g. strong equality is normal equality, while weak equality is equivalence, similar strong order is provided by the normal “less than” relation, while weak order has the trichotomy relation with an equivalence relation.
Language representation
In a language like C++ predicates are represented as functions taking two
parameters with the return value either bool
or at least something that can
be evaluated the same way as a bool
can.
The predicate function would be a regular function: called repeatedly with equal inputs it returns the same result.
This function can be either:
- a plain function
- a member function taking one explicit parameter (the other parameter comes
via the
this
pointer) - a special function like
operator<
(either as a member function or as a pain standalone function) - a function object (that has a
operator()
) - an
std::function
- a lambda
- etc.
For identity we usually check equality of addresses. E.g. in the move
assignment a check if (this != &other)
handles the case where an attempt is
made to move an object into itself: is it the same object? (Note that it’s
debatable if/why such a check is/should be required).
For equality the sane approach is to use ==
and !=
and make sure they work
properly, including one being the opposite of the other, and together with
copy. For “less than” the sane approach is to use <
and make sure it works
properly with the other operators like <=
, >
, >=
and with equality.
Examples of heterogeneous binary predicates are those comparing std::string
with literal strings or std::string_view
, or testing equality of an iterator
with the associated sentinel type in the std::ranges
library.
Even if heterogeneous, binary predicates usually are happy to take the two
different types in any order: we’re as likely to compare a std::string
with a
literal string as to compare a literal string with a std::string
. In some
cases that’s not the case, such as a predicate that indicates if path is a
child of another (e.g. to implement some scanning exclusion).
The weird case of float
Why would you not make the sane choices? Here the example of float
. It is
largely governed by an external spec IEEE-754, where sane choices were not
made (though with good intentions probably). Here are two examples.
The case of NaN
There are multiple values that usually result from computation errors, such as
divide by 0
, they are generally referred as NaN
(i.e. Not A Number). The
rules of comparing against a NaN
are not sane mathematically: e.g. they don’t
respect properties such as trichotomy. One approach is to say that float
comparisons are partial, not total, that NaN
values are not in the domain of
comparisons, though realistically the chance of enforcing that in a large
program are slim.
The case of -0.0 and +0.0
These two values compare equal using the ==
operator, but they are different,
so in fact operator ==
captures a equivalence relation, not equality.