std::string_view corners
std::string_view
introduction and the dark corners of it’s usage
Motivation
Functions taking const char pointers
Given the function below taking a char pointer:
1
void foo_taking_char_ptr(const char * input);
It can be straight forward called with a literal string:
1
foo_taking_char_ptr("some string");
However, if we have a std::string
, for example returned from a function
bar
:
1
2
3
std::string bar() {
return "abc"; // constructs, then returns, a std::string containing "abc"
}
Then we need to use c_str()
:
1
foo_taking_char_ptr(bar().c_str());
Using c_str()
is inconvenient, but efficient.
Functions taking const std::string references
Similarly the function below taking a string reference:
1
void foo_taking_string_ref(const std::string & input);
Can be straight forward called with a std::string
1
foo_taking_string_ref(bar());
Can be called with a string literal:
1
foo_taking_string_ref("some string");
However in the latter case, the call will allocate a temporary std::string
,
possibly involving a memory allocation, which is not efficient, despite the
convenient syntax.
Enter std::string_view
What it is?
A pointer and a size (or two pointers). It provides a constant view into either
a std::string
or a string literal, a sub-range of, or any other contiguous
sequence of characters. The pointer points at the beginning, the size is the
length of the sequence (or pointer to one past the last, i.e the end). It’s a
constant view, it does not allow changing the sequence it points to (in order
to be able to provide views into literal strings).
Functions taking std::string_view
This function taking a std::string_view
by value (yes, not by const
reference):
1
2
3
4
void foo_taking_string_view(std::string_view input) {
// can use input.begin(), input.end()
// and input.data(), input.size(), etc.
}
Can be straight forward called with a std::string
1
foo_taking_string_view(bar());
Can be called with a string literal:
1
foo_taking_string_view("some string");
With no issues about lack of convenience or efficiency.
Usage corner cases
It’s not zero terminated
Maybe it is, but can’t rely on it.
1
2
3
4
void bad_foo(std::string_view input) {
FILE * f = fopen(input.data(), "r");
...
}
You would need instead something that would ensure that the sequence is zero
terminated, maybe namedzstring_view
or similar.
You can of course take a copy and ensure it’s zero terminated
1
2
3
4
5
6
void copying_foo(std::string_view input) {
std::string file_name(input);
FILE * f = fopen(file_name.c_str(), "r");
...
}
But then this is fake economy of allocations, because calling copying_foo
with a std::string
results in an unnecessary copy into file_name
.
Also if this is repeated then this repeats the copy into std::string
for zero
termination purposes to call into C APIs.
1
2
3
4
5
6
7
void copying_foo_usage() {
std::string input = get_file_name();
copying_foo(input); // takes a copy to zero terminate
// some more code, then use it again:
copying_foo(input); // takes another copy to zero terminate
}
Practically (short of using some zstring_view
), the solution of passing a
string is not too bad: it only allocates unnecessarily if the original argument
is a literal string.
1
2
3
4
5
void foo_taking_string(const std::string & input) {
FILE * f = fopen(input.c_str(), "r");
...
}
1
2
3
4
5
6
7
void foo_taking_string_usage() {
std::string input = get_file_name();
foo_taking_string(input); // no allocation
// some more code, then use it again:
foo_taking_string(input); // no allocation
}
Using std::string_view variables is risky
The example below is incorrect, it creates a dangling pointer situation:
1
2
std::string_view a = bar();
foo_taking_string_view(a);
That’s because the std::string
returned by bar()
is a temporary that gets
destroyed on the same line. By the time a
is used, as a function argument, it
points to destroyed data.
The example below is correct, but non idiomatic:
1
2
3
std::string b = bar();
std::string_view c = b;
foo_taking_string_view(c);
That’s because the scope of b
is larger than the scope of c
.
The example below is incorrect, it again creates a dangling pointer situation:
1
2
3
std::string d = bar();
std::string_view e = d + "something";
foo_taking_string_view(e);
That’s because the plus operator returns a temporary that gets destroyed on the
same line. By the time e
is used, as a function argument, it points to
destroyed data.
The example below is correct, it’s idiomatic usage:
1
2
std::string f = bar();
foo_taking_string_view(f);
The example below is correct, it’s also idiomatic usage:
1
foo_taking_string_view(bar());
And yet, why is it correct? Specifically: what guarantees that the temporary string returned from bar is still alive by the time foo is executed?
Or put another way: what guarantees that the temporary string is not destroyed as soon as the temporary string_view is constructed, before foo is executed?
The C++ standard quote ensuring correctness for the last example is:
Temporary objects are destroyed as the last step in evaluating the full-expression that (lexically) contains the point where they were created.
Plainly put: the temporary returned by bar()
gets destroyed after the
semicolon ;
.
The moral is probably: avoid std::string_view
variables, their usage is
risky (as opposed to usage as function arguments).
The reason the risky usage is not prevented is that the easy mechanisms of
preventing it (such as std::string_view
not allowing construction from
temporary std::string
) would also prevent idiomatic usage (provide temporary
to a std::string_view
argument function call).
Not quite Regular
Another way of looking at it is that std::string_view
behaves like a
reference (underlying it has a pointer).
That way of thinking makes for alternative reasoning with regards to usage e.g.:
- pass by value, when a function argument, like a pointer
- a variable that points to a temporary will become dangling when the temporary gets destroyed
But std::string_view
shares with references issues with regards to the
applicability of the Regular
concept:
1
2
3
4
5
std::string g = bar();
std::string_view h = g; // take a copy
assert(h == g); // copy is same as original
g[0] = 'z'; // change original
assert(h == g); // not Regular behaviour, but expected for references
For a Regular
type, when you copy construct, the copy is equal to the
original, but if the original changes, the copy is no longer equal.
That is not the case for std::string_view
, making it not quite Regular
.