The things we know
Algorithmic efficiency often has at its roots things we know, and therefore skip unnecessary calculations. Not all the things we know can be checked by the computer.
This article is part of a series on the history of regular data type in C++.
For the end of this series I’d like to make the case that the regular concept is part of a larger idea that traverses the “Elements of Programming”. And that is that it’s things that we know that we can use for algorithmic efficiency to reorganize calculations and eventually skip unnecessary calculations.
I’m going to try to illustrate this using some memorable canonical examples.
Add
1
2
3
4
// domain of the function are x and y such that `x + y` does not overflow
unsigned int add(unsigned int x, unsigned int y) {
return x + y;
}
In this case the precondition could be checked using this predicated, but the
calculation is about as costly as the work inside add
itself
1
2
3
bool in_domain_of_add(unsigned int x, unsigned int y) {
return ((UINT_MAX - x) >= y);
}
This is an example where a check could be done, with some loss of efficiency, but also some efficiency gained if skipped when it is known that overflow will not happen for those specific input values.
Binary search, lower bound, partition point
There is a family of algorithms that use divide and conquer on a already
ordered range: partition_point
(illustrated above), lower_bound
(partition_point
for less than a given value), binary_search
(lower_bound
plus equality check).
Below is for example binary_search
that can efficiently find in the value
is in the range by halving the search range at each step leading to O(lg(N))
time complexity.
1
2
3
template<class It, class T>
// It is random access iterator
bool binary_search(It first, It last, const T & value);
But binary_search
itself it does not check if the sequence is sorted. Similar
issues apply to lower bound and partition point for the requirement that the
sequence is partitioned for the comparison used (defaults to ‘<’).
A runtime check that the input range is sorted would degrade the time
complexity to O(N)
.
This is an example where a check could be done, but at a meaningful performance cost.
first and last
A lot of the algorithms operate on ranges of data that are often expressed as
two iterators as generalized pointers, as in the case of binary_search
. They
take first
and last
then for linear traversal there is a loop like:
1
2
3
while(; first != last ; ++first) {
// use value pointed by first
}
But it does not check that last
is reachable from first
. The caller needs
to know that. Such a check is often not even possible for certain data
structures.
Rotate
The algorithm rotate takes three iterators:
1
2
3
template<class It, class T>
// It is random access iterator
It rotate(It first, It middle, It last);
The result is that between first
and last
we end up with the values that
were between middle
and last
, followed by the values that were between
first
and middle
. It does that in a surprisingly O(N)
time complexity
with constant memory requirements.
Roughly the code idea to make that work is that it figures out where the value
at first
needs to go. It saves the target value in a temporary and stores the
value from first
. Then it figures out where that temporary needs to go and
so on until we complete a cycle going back all the way to first
. And then
there are GCD(M, N)
such cycles where N
is the total number of values between
first
and last
, M
is the number of values between first
and middle
and GCD
is the greatest common divisor.
This is interesting because, in the C++ world of destructing moves, every time
we move from a value, the move (constructor or assignment) also changes the
moved from object. This is not required in the case of rotate
because “we
know” based on the clever reasoning that each of the GCD(M, N)
number of
cycles will be complete. Currently this might be an optimisation that does not
take place in C++ because the compiler does not know that.
Replace
The replace traverses a linear sequence and replaces a value with another:
1
2
3
template<class It, class T>
// It is forward iterator
It replace(It first, It last, const T & old_value, const T & new_value);
But what if we use like this with references to values from the sequence?
1
std::replace(v.begin(), v.end(), v[3], v[2]);
If the input is: 0, 2, 1, 2, 3, 2, 1
and expect to replace 2
with 1
and
expect to get 0, 1, 1, 1, 3, 1, 1
you might be surprised when you get 0, 1,
1, 1, 3, 2, 1
.
Although const
, the problem is that the old_value
is alised, it’s also part
of the sequence, and part through the replacement we change what we’re trying
to do. Another way to look at aliasing is that v[3]
is accessed both via the
pointer behind the const T &
and the pointer behind the It first
after it
gets incremented a few times.
In correct usage we know we’re not doing what the example above does, but it’s
an interesting fact that in many cases (correct, unlike the incorrect usage of
replace
above), the compiler does not know that we’re not aliasing and
inserts more code to cope with that.
Note that one of the reason Fortran is faster than C/C++ is that Fortran assumes that function arguments do not alias.
Fibonacci and associativity
Naive implementations of Fibonacci sequences take exponential time with the
argument, so they get slow very fast taking seconds for an argument of 40
something. A slightly better option is linear in time. But efficient
implementations can calculate it in logarithmic time taking less
than a second for an argument of 1 million. The idea is that to
calculate the values for N
and N - 1
, one multiplies a matrix with the
values at N - 1
and N - 2
. Repeat the process and you end with multiplying
the matrix with itself. We know that the matrix multiplication is associative
and that’s what enables it to be done in O(lg(N))
time.
Note that’s covered in chapter 3 of EOP.
Orbits and regularity
When you apply a function T f(T x)
repeatedly the outcome can be:
- Infinite: it goes on without getting back to a previous value, e.g. append a char to a string (until we run out of memory)
- Terminating: eventually cannot go further: e.g.
unsigned int
increment terminates at the max value if we don’t want it to overflow - Circular: it comes back to where we started: e.g.
usigned int
increment that goes back to0
after the max value - ρ-shaped: it creates a loop, but not to where we started, e.g. increment then
take modulo(3) then leads to a sequence like
7, 2, 0, 1 , 2, 0, 1, ...
For this domain if the shape is not infinite one can use algorithms that determine the shape by having a fast traversal followed by a slow traversal and testing for end (terminating) or the two traversal giving the same value (circular or ρ-shaped).
Here is the core idea:
1
2
3
4
5
6
7
auto slow = x;
auto fast = f(x);
while (fast != slow) {
slow = f(slow);
fast = f(fast);
fast = f(fast);
}
These obit algorithms are quite obscure, but they show in the simplest setting the reliance of algorithms on regularity (in this case it’s regularity that ensures that the slow traversal follows the fast one).
Note that’s covered in chapter 2 of EOP, but regularity is fundamental for most algorithms.
Irregular partition “predicate”
You have a vector of unique file names corresponding to files that you want to update. You want to separate this into the file names that correspond to files that already exist and the ones that are missing. Then further separate the existing file names into the ones that are the correct version vs. older/different version. That results in: files names that correspond to files that are OK, need updating or need installing for the first time.
We could use the partition
algorithm twice where the predicate checks if the
file exist and if the file version is correct.
1
2
3
template<class It, class UnaryPredicate>
// It is bidirectional iterator
It partition(It first, It last, UnaryPredicate p);
But the problem is that the predicate is not a regular function. EOP and the
C++ standard define a predicate as a regular function. But in the case of the
partition
the predicate is meant to be called only once for each value in a
sequence of unique values. Regularity for functions involves calling repeatedly
a function with equal values. Well, that’s not the case here.
This illustrates that for library/language design there is a trade-off between elegance/simplicity and functionality/flexibility.
If you insist that the partition
predicate has to be a regular function,
you end up with a more elegant/simpler solution at the const of the kind of
problems that the algorithm can solve. The example I gave is probably
dominated by IO operations, but should you aim to solve if efficiently, one
solution involves exactly the same code as the partition
algorithm.
A note on value semantics: actions vs. transformations
So if values are good and aliasing is so problematic, why don’t we just use
values? Why do we pass vectors by reference, e.g. auto foo(const
std::vector<T> & vec)
? The aliasing problems come from multiple pointers to
the same data, where the pointers are either embedded into iterators or through
the usage of references.
I guess that partly the answer is in EOP chapter 2.5. After describing orbits in terms of transformations in most of chapter 2, at the end of chapter it describes the duality between actions and transformations: they can be each described in terms of the other:
1
2
3
4
5
6
7
8
9
10
// action defined in terms of transformation
void a(T & x) {
x = f(x);
}
// transformation defined in terms of action
T f(T x) {
a(x);
return x;
}
Mathematically they are equivalent, the difference is in efficiency: if T
is
a large object and the change is small, actions could be much faster. When
passing a large object and there is no change we use const
reference.
Fundamentally it comes to: when we know that the data is not aliased and the data is large, transferring data by just transferring a pointer to the data is faster than transferring a copy of the data.
References
Alex Stepanov and Paul McJones: Elements of Programming aka. EOP
Value Semantics: Safety, Independence, Projection, & Future of Programming -
Dave Abrahams CppCon 22 for the
std::replace
example