begin()
C++ standard library variations on obtaining the start iterator for a sequence
(as of C++20-ish). Not all the begin()
s are the same.
The good
std::vector, the canonical case
The vector has a member method begin()
that returns an iterator, The iterator
wraps a pointer, the pointer points to the first value in the underlying array
(assuming the vector is not empty). That iterator can be used to point to other
values in the underlying array and get a reference to read or write those
values.
What if the vector is const
? The design decision for std::vector
was to
make it a regular type, hence it should behave like int
, hence a const
std::vector
should not allow you to change it’s values. That’s why we have
std::vector<int>
and const std::vector<int>
, but rarely std::vector<const
int>
.
Therefore the vector has two begin()
member methods:
1
2
3
4
5
6
7
8
9
10
11
template<typename T>
class vector
{
// lots of other details skipped ...
constexpr iterator begin() noexcept {
return iterator(begin_);
}
constexpr const_iterator begin() const noexcept {
return const_iterator(begin_);
}
};
For a const
vector, the additional begin() const
member method will be
selected by the compiler. That begin()
returns a const_iterator
, which
gives a reference to the const
value, therefore through the const_iterator
we can read the value, but not modify it.
Later the vector got a cbegin()
member method:
1
2
3
4
5
6
7
8
template<typename T>
class vector
{
// lots of other details skipped ...
constexpr const_iterator cbegin() const noexcept {
return begin();
}
};
The intent for that member method is to obtain a const_iterator
when you have
a non-const vector.
A similar situation is for the end of the sequence with two member methods for
end()
and also a cend()
member method.
From the point of the user one can mix and match comparing iterator
s with
const_iterator
s, which is handy, but results in a large number of functions
to write by the implementer of the container (albeit all these functions have
simple implementations).
std::string catchup
std::string
is a much older type than std::vector
. From that older history
it has member methods like find
that return positions, with a special value
npos
for “not found”.
When large parts of STL such as vector
where added to the standard library,
std::string
was retrofitted to have member methods like begin
and cbegin
just like the vector
.
array and standalone functions
The array however does not have member functions. So then standalone template
functions were added for begin
, end
, size
. The syntax through which they
take the array is a bit surprising at first, but other than that the
implementation is straight forward:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
template<class T, std::size_t N>
T * begin(T (&array)[N]) {
return array;
}
template<class T, std::size_t N>
T * end(T (&array)[N]) {
return array + N;
}
template<class T, std::size_t N>
size_t size(T (&array)[N]) {
return N;
}
And overloads were created to call the member functions e.g. begin()
otherwise, so that you can use the free function std::begin
with both arrays
or containers.
“for” loops
With that we can loop:
1
2
3
4
5
for (std::vector<int>::iterator it = vec.begin();
it != vec.end();
++it) {
// dereference it to either read or change values
}
The whole std::vector<int>::iterator is a mouthfull, especially when using
cbegin`:
1
2
3
4
5
for (std::vector<int>::const_iterator it = vec.cbegin();
it != vec.cend();
++it) {
// dereference it to read values, but can't change values
}
auto
can simplify such loops:
1
2
3
for (auto it = vec.begin(); it != vec.end(); ++it) {
// dereference it to either read or change values
}
And range-based for loops were introduced to simplify theese common scenarios further:
1
2
3
4
5
6
7
for (const auto & value : vec) {
// read value
}
// or if needed
for (auto & value : vec) {
// read or change value
}
The range-based for loops rely on the free functions std::begin
and
std::end
.
The bad and the ugly
std::filesystem::directory_iterator
std::filesystem::directory_iterator
is called an “iterator”, and yet it can
be used in a range-based for loop to iterate through files in a folder as
below:
1
2
3
for (const auto & dir_entry : directory_iterator("C:\\Some folder")) {
std::cout << dir_entry.path() << '\n';
}
The reason it gives access to a directory entry rather than just the path is
that the OS APIs it uses, e.g. FindFirstFileExW
and FindNextFileW
in
Windows, also return file attribute, not just the file names.
To do so it has member functions begin()
and end()
which each return …
drum rolls … a directory_iterator
. begin()
just returns a copy, while the
end()
returns a default constructed one. This ends up roughly equivalent to:
1
2
3
4
5
for (auto it = directory_iterator("C:\\Some folder");
it != directory_iterator();
++it) {
std::cout << it->path() << '\n';
}
The directory_iterator
is a weird type: it started life as something close to
an iterator, then become more like a range, while it preserved the old name.
The design/documentation also forces the storage of the handle returned from
e.g. FindFirstFileExW
in a heap allocated, reference counted location. Also
it’s end iterator predates ideas of a sentinel for determining if we reached
the end.
Dereferencing the directory_iterator
gives access to a const
std::filesystem::directory_entry
, so in a way it’s a const iterator, but while
it has a begin()
, it does not currently have a cbegin()
.
std::string_view
A string_view
gives read only access to a memory contiguous range of
characters. That allows some unification for functions that take either string
literals or std::string
(and do not care about zero
termination), so the access has to be read only to accomodate
literals.
As for the range, it typically stores two pointers and returns the first as the
begin()
and the last as the end()
. But the iterators it returns are const
iterators, and it’s cbegin()
returns the same const iterator.
std::span
But what if you want access to a memory contiguous range of values, not
necessarily characters (so not interoperability with std::string
is required)
and also get write access, not just read?
Enter std::span
. It derives from a class gsl::span
, where gsl
was a
largely Microsoft supported library intended to make things safer.
Like string_view
, to expose the range, span
typically stores two pointers
and returns the first as the begin()
and the last as the end()
. But the
iterators it returns are NOT const iterators.
And here is where things get an interesting turn. gsl::span
used to have a
member function cbegin()
that returned a const iterator. In the process of
standardising the type as std::span
people noticed that std::span::cbegin()
returns a const iterator, but the free standing function std::cbegin
returns
a mutable iterator (i.e. different types, see
PL247). That’s because the
free standing std::cbegin
just makes the object constant and calls begin()
on it, hoping to select the const member function begin()
.
But std::span
has issues with constness. As a type it makes sense to model a
pointer where just because the pointer is const
(shallow), does not mean that
the pointed data is const
(deep). It’s member function begin()
returns a
non const iterator (allowing to change the pointed data) even if the span
object is const
(can’t change the pointers it embeds).
In the rush of standardising std::span
, it’s cbegin()
member function was
dropped, see LGW3320, though
hopefully these issues will be sorted out (see proposals
P2278r4
and
P2276r1).
std::generator
std::generator
is a template type introduced in C++23 to support one of the
usage scenarios of C++ coroutines: generate sequences.
1
2
3
4
5
6
7
8
9
10
11
12
std::generator<int> natural_numbers() {
int crt = 0;
for(;; ++crt) {
co_yield crt;
}
}
void foo() {
for (const int x : natural_numbers() | std::views::take(10)) {
std::cout << x << ' ';
}
}
For the above to work, the std::generator
implements begin()
and end()
,
but: “It is an undefined behavior to call begin()
more than once on the same
generator
object.”
I suspect this is because like directory_iterator
manages a resource (the
HANDLE
returned by FindFirstFileExW
), the generator
also manages a
resource (the coroutine state). The directory_iterator
made the choice of
using reference counting to manage this resource, making it copyable at
additional cost, while generator
gives up iterator copy and convenience in
order to avoid that cost.