Min and max basics
Finding the smaller and the larger of two values, how hard can it be, right?
Introduction
To evaluate the capabilities of a programming language it’s practical to start
by examining in detail just three functions: min
, swap
and linear search
(find
). Despite their presumed simplicity, it’s difficult, but important
that they are done right. For example, the three functions contain the
fundamentals for constructing sort
, itself an important building block for
many programs.
Here I’m going to focus mainly on min
, a function that takes two arguments
and returns the smallest of the two, and look at a C++ implementation.
Essentially we’re going to implement std::min
from the standard C++ library,
Here is a sample C++ implementation, with subtly different choices, mainly because the focus here is on what’s ideal, without any pressure to be backwards compatible.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
namespace algs {
struct less
{
template<typename T, typename U>
bool operator()(T && a, U && b) const {
return std::forward<T>(a) < std::forward<U>(b);
}
};
struct identity
{
template<typename T>
T && operator()(T && x) const {
return std::forward<T>(x);
}
};
// generalized min taking comparison and projection
template<typename T, typename Cmp, typename Proj>
// requires Cmp is a strict weak ordering for projection Proj of T
const T & min(const T & a, const T & b, Cmp cmp, Proj proj) {
if (cmp(std::invoke(proj, b), std::invoke(proj, a))) return a;
return b;
}
// generalized min taking comparison
template<typename T, typename Cmp>
// requires Cmp is a strict weak ordering for T
const T & min(const T & a, const T & b, Cmp cmp) {
return min(a, b, cmp, algs::identity());
}
// straight forward min
template<typename T>
// requires T is StrictTotallyOrdered
const T & min(const T & a, const T & b) {
return min(a, b, algs::less());
}
}
The rest of the article is about how did we get there.
Generic
The implementation of algorithms like min
has to be generic.
Using a non-generic approach is repetitive as demonstrated below for int
and
double
:
1
2
3
4
5
6
7
8
9
int min_int(int a, int b) {
if (a < b) return a;
return b;
}
double min_double(double a, double b) {
if (a < b) return a;
return b;
}
This approach might just about do for min
, but it does not work for more
complex algorithms. Formally it’s the N x M
problem: given N
data types and
M
algorithms we want to avoid having to write N x M
implementations.
Using C++ templates, we defined min
as a function template with a type T
as
a parameter:
1
2
3
4
5
6
namespace algs {
template<typename T>
const T & min(const T & a, const T & b) {
// implementation here
}
}
This allows a generic solution that can be implemented once and then work with
a variety of types like: int
, double
, other built-in types, std::string
,
std::vector
, other containers, user defined types.
Namespace
We want to be able to use min
like this:
1
2
3
4
std::string x{ "Alice" };
std::string y{ "Charlie" };
std::cout << min(x, y) << '\n';
// Prints: Alice
Now, if this example compiles, it’s likely that it uses std::min
because of
rules that will look for functions in the same namespace of the arguments, aka.
argument dependent lookup (ADL).
To avoid ambiguity, we defined our min
implementation in a namespace
algs
, and we’ll qualify the call explicitly:
1
std::cout << algs::min(x, y) << '\n';
Arguments and return value
Arguments are passed by reference which has the advantage that larger types do not get copied unnecessarily.
Arguments are passed by const
reference, which has the advantage that
literals can also be used as arguments:
1
2
int x = 7;
std::cout << algs::min(x, 3) << '\n';
Another advantage of this declaration is that the type T
is automatically
deduced: most of the time does not need to be specified by the user.
However one disadvantage is that we can’t mutate the returned value, because
there is no overload for non-const
reference.
1
2
3
4
void increment_smallest(int & x, int & y) {
int & z = algs::min(x, 3); // Fails to compile
++z;
}
Implementation - options
As for the implementation, we’ve got several options:
1
2
3
4
if (a < b) return a; else return b; // Option 1
if (a <= b) return a; else return b; // Option 2
if (a > b) return b; else return a; // Option 3
if (b < a) return b; else return a; // Option 4
When the arguments are equal, option 1 returns b
, the other options return
a
. Options 2, 3 and 4 differ in the exact operator used.
The canonical implementation is the equivalent of option 4.
If the arguments are equal, it should not matter which one is returned. However equal does not mean identical, and one can probe the address of the returned value to see which one is returned. We’ll come back to this later.
Also min
does not exist in isolation, but in relationship with other
algorithms, and that’s what clarifies what’s the most coherent implementation
option.
Relationship with other algorithms
First of all we have similar algorithms that differ by the value returned:
max
which returns the larger of twominmax
which returns a pair (smallest first, larger second) out of two
When values are equal, it makes sense for min
to return one, and for max
to
return the other, this leads to requiring that the canonical implementation for
max
does the equivalent of:
1
if (b < a) return a; else return b;
Note that the above is not what std::max
does, because it’s standardised to
do the wrong thing.
Then we have algorithms that operate on a sequence:
min_element
which returns an iterator to the smallest out of a sequencemax_element
which returns an iterator to the larger out of a sequenceminmax_element
returns a pair of the two out of a sequence
Then we algorithms that use the order to do sequence permutations:
nth_element
which sorts up to the n-th position (min
is a very basic version of this, getting just the smallest value)sort
which sorts an entire sequencestable_sort
which also sorts, but preserves the order of equal elements
It is the relationship with sort
that makes <
the implicit comparison
choice. E.g. it has the advantage that the sequence will be sorted ascending,
which is the natural order.
The stability property refers to preserving the order of equal elements. For
sorting there are two different function because achieving stability comes
there at an additional computational effort. If min
(and max
) can have the
stability property at no additional cost, we should not give that up lightly.
And finally we have versions of min
:
- with two arguments: implicitly using
<
for comparisons - with three arguments: adds an explicit comparison function
- with four arguments: adds a projection function that specifies what to compare
If you try to invoke the straight forward min
with an unsuitable type you’ll
get an error that basically follows the nesting of whatever the implementation
happens to be:
StrictTotallyOrdered
The version of min
with two arguments uses <
for comparison and has the
requirement that the type T
is StrictTotallyOrdered
. That
means that not only <
is implemented, but also all the other comparisons,
including equality are defined, and they work sanely e.g.: given arbitrary a
and b
, one an only one is true: a < b
or a == b
or a > b
.
Comparison
The version with three arguments adds an explicit comparison function. This
version has relaxed requirements. The comparison can be weak. Unlike <
, the
counterpart is a equivalence relation, not equality. E.g. we can compare
instances of person
by age
(instances that have different names, but same
age are equivalent). It’s here that it does matter which one is returned out of
two equivalent values.
1
2
3
4
5
6
person x{ "Alice", "Wonderland", 7 };
person y{ "Bob", "Builder", 5 };
std::string result = algs::min(x, y, [](const person & a, const person & b) {
return a.age < b.age;
}).first_name;
// result is "Bob"
Projection
Projection is applied before using the values (passing them to the comparison function.
1
if (cmp(proj(b), proj(a))) return a;
This allows a different way to compare by age
:
1
2
3
4
5
6
person x{ "Alice", "Wonderland", 7 };
person y{ "Bob", "Builder", 5 };
std::string result = algs::min(x, y, algs::less(), [](const person & a) {
return a.age;
}).first_name;
// result is "Bob"
In the implementation above std::invoke
was used instead.
1
if (cmp(std::invoke(proj, b), std::invoke(proj, a))) return a;
This allows a simpler call then the projection only takes a member variable:
1
2
3
4
person x{ "Alice", "Wonderland", 7 };
person y{ "Bob", "Builder", 5 };
std::string result = algs::min(x, y, algs::less(), &person::age).first_name;
// result is "Bob"
Less - function object
In the code above less
is a struct and only it’s operator()
is a function
template. This choice is more similar to std::less<>
which for backward
compatibility reasons is a specialization for std::less
for void
.
Notice that the arguments T
and U
can be different types. min
does not
use this, but it’s useful for things like std::set<std::string>
when we want
to find using a literal string, so comparison is between a const std::string
&
and a const char *
.
References
- Elements of Programming (book by Alexander A. Stepanov and Paul McJones)
- Some ADL comments
- Concepts basics for more details on enforcing requirements