C++ standard library variations on obtaining the start iterator for a sequence (as of C++20-ish). Not all the begin()s are the same.

The good

std::vector, the canonical case

The vector has a member method begin() that returns an iterator, The iterator wraps a pointer, the pointer points to the first value in the underlying array (assuming the vector is not empty). That iterator can be used to point to other values in the underlying array and get a reference to read or write those values.

What if the vector is const? The design decision for std::vector was to make it a regular type, hence it should behave like int, hence a const std::vector should not allow you to change it’s values. That’s why we have std::vector<int> and const std::vector<int>, but rarely std::vector<const int>.

Therefore the vector has two begin() member methods:

1
2
3
4
5
6
7
8
9
10
11
template<typename T>
class vector
{
// lots of other details skipped ...
  constexpr iterator begin() noexcept {
    return iterator(begin_);
  }
  constexpr const_iterator begin() const noexcept {
    return const_iterator(begin_);
  }
};

For a const vector, the additional begin() const member method will be selected by the compiler. That begin() returns a const_iterator, which gives a reference to the const value, therefore through the const_iterator we can read the value, but not modify it.

Later the vector got a cbegin() member method:

1
2
3
4
5
6
7
8
template<typename T>
class vector
{
// lots of other details skipped ...
  constexpr const_iterator cbegin() const noexcept {
    return begin();
  }
};

The intent for that member method is to obtain a const_iterator when you have a non-const vector.

A similar situation is for the end of the sequence with two member methods for end() and also a cend() member method.

From the point of the user one can mix and match comparing iterators with const_iterators, which is handy, but results in a large number of functions to write by the implementer of the container (albeit all these functions have simple implementations).

std::string catchup

std::string is a much older type than std::vector. From that older history it has member methods like find that return positions, with a special value npos for “not found”.

When large parts of STL such as vector where added to the standard library, std::string was retrofitted to have member methods like begin and cbegin just like the vector.

array and standalone functions

The array however does not have member functions. So then standalone template functions were added for begin, end, size. The syntax through which they take the array is a bit surprising at first, but other than that the implementation is straight forward:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
template<class T, std::size_t N>
T * begin(T (&array)[N]) {
    return array;
}

template<class T, std::size_t N>
T * end(T (&array)[N]) {
    return array + N;
}

template<class T, std::size_t N>
size_t size(T (&array)[N]) {
    return N;
}

And overloads were created to call the member functions e.g. begin() otherwise, so that you can use the free function std::begin with both arrays or containers.

“for” loops

With that we can loop:

1
2
3
4
5
for (std::vector<int>::iterator it = vec.begin();
     it != vec.end();
     ++it) {
  // dereference it to either read or change values
}

The whole std::vector<int>::iterator is a mouthfull, especially when using cbegin`:

1
2
3
4
5
for (std::vector<int>::const_iterator it = vec.cbegin();
     it != vec.cend();
     ++it) {
  // dereference it to read values, but can't change values
}

auto can simplify such loops:

1
2
3
for (auto it = vec.begin(); it != vec.end(); ++it) {
  // dereference it to either read or change values
}

And range-based for loops were introduced to simplify theese common scenarios further:

1
2
3
4
5
6
7
for (const auto & value : vec) {
  // read value
}
// or if needed
for (auto & value : vec) {
  // read or change value
}

The range-based for loops rely on the free functions std::begin and std::end.

The bad and the ugly

std::filesystem::directory_iterator

std::filesystem::directory_iterator is called an “iterator”, and yet it can be used in a range-based for loop to iterate through files in a folder as below:

1
2
3
for (const auto & dir_entry : directory_iterator("C:\\Some folder")) {
  std::cout << dir_entry.path() << '\n';
}

The reason it gives access to a directory entry rather than just the path is that the OS APIs it uses, e.g. FindFirstFileExW and FindNextFileW in Windows, also return file attribute, not just the file names.

To do so it has member functions begin() and end() which each return … drum rolls … a directory_iterator. begin() just returns a copy, while the end() returns a default constructed one. This ends up roughly equivalent to:

1
2
3
4
5
for (auto it = directory_iterator("C:\\Some folder");
     it != directory_iterator();
     ++it) {
  std::cout << it->path() << '\n';
}

The directory_iterator is a weird type: it started life as something close to an iterator, then become more like a range, while it preserved the old name. The design/documentation also forces the storage of the handle returned from e.g. FindFirstFileExW in a heap allocated, reference counted location. Also it’s end iterator predates ideas of a sentinel for determining if we reached the end.

Dereferencing the directory_iterator gives access to a const std::filesystem::directory_entry, so in a way it’s a const iterator, but while it has a begin(), it does not currently have a cbegin().

std::string_view

A string_view gives read only access to a memory contiguous range of characters. That allows some unification for functions that take either string literals or std::string (and do not care about zero termination), so the access has to be read only to accomodate literals.

As for the range, it typically stores two pointers and returns the first as the begin() and the last as the end(). But the iterators it returns are const iterators, and it’s cbegin() returns the same const iterator.

std::span

But what if you want access to a memory contiguous range of values, not necessarily characters (so not interoperability with std::string is required) and also get write access, not just read?

Enter std::span. It derives from a class gsl::span, where gsl was a largely Microsoft supported library intended to make things safer.

Like string_view, to expose the range, span typically stores two pointers and returns the first as the begin() and the last as the end(). But the iterators it returns are NOT const iterators.

And here is where things get an interesting turn. gsl::span used to have a member function cbegin() that returned a const iterator. In the process of standardising the type as std::span people noticed that std::span::cbegin() returns a const iterator, but the free standing function std::cbegin returns a mutable iterator (i.e. different types, see PL247). That’s because the free standing std::cbegin just makes the object constant and calls begin() on it, hoping to select the const member function begin().

But std::span has issues with constness. As a type it makes sense to model a pointer where just because the pointer is const (shallow), does not mean that the pointed data is const (deep). It’s member function begin() returns a non const iterator (allowing to change the pointed data) even if the span object is const (can’t change the pointers it embeds).

In the rush of standardising std::span, it’s cbegin() member function was dropped, see LGW3320, though hopefully these issues will be sorted out (see proposals P2278r4 and P2276r1).

std::generator

std::generator is a template type introduced in C++23 to support one of the usage scenarios of C++ coroutines: generate sequences.

1
2
3
4
5
6
7
8
9
10
11
12
std::generator<int> natural_numbers() {
  int crt = 0;
  for(;; ++crt) {
    co_yield crt;
  }
}

void foo() {
  for (const int x : natural_numbers() | std::views::take(10)) {
    std::cout << x << ' ';
  }
}

For the above to work, the std::generator implements begin() and end(), but: “It is an undefined behavior to call begin() more than once on the same generator object.”

I suspect this is because like directory_iterator manages a resource (the HANDLE returned by FindFirstFileExW), the generator also manages a resource (the coroutine state). The directory_iterator made the choice of using reference counting to manage this resource, making it copyable at additional cost, while generator gives up iterator copy and convenience in order to avoid that cost.