Why do data() and c_str() return char const*, while operator[] returns char&?

operator [] gives you direct access to the controlled sequence of std::string object. c_str() originally did not.

In the original specification of std::string the stored sequence was not required to be a zero-terminated string. This meant that in general case c_str() could not return a direct pointer to the stored sequence. It had to return a pointer to a completely independent, separately allocated temporary copy of the controlled sequence (with an added zero terminator character). For this reason, trying to modify the C-string returned by c_str() made no sense at all. Any modifications applied to that separate C-string would not be propagated to the actual controlled sequence. (In fact, the specification explicitly prohibited any modification attempts. For example, for an empty std::string an implementation could simply return a pointer to a string literal "", which was of course non-modifiable and could be easily shared between all std::string objects.) So, it made perfect sense to make c_str() to return const char *.

C++11 changed the internal specification of c_str() making it to return a direct pointer to the actual controlled sequence. But the external spec of c_str() remained unchanged to keep it aligned with the legacy spec.


For historical reasons, C++ and its standard library support C-strings (character arrays), and lots of C++ code uses C-strings for input and output.

You can also imagine a possible implementation of the std::string that keeps its data in a character array. This would normally be a completely private implementation detail, that is not exposed through the class' public interface.

EDIT: to be explicit, a class would normally not expose non-const views of its private data. To see why this would be an issue, imagine the following code:

std::string s("abc");  
char* ps = s.c_str();  //  ps[0] == 'a' and ps[3] == '\0'
ps[3] = 'd';  // string is not null terminated
printf("%s", s.c_str());  // printing non-terminated string.

Such a change would allow a user of the class to change its private data in a way that breaks invariants, namely the following invariant: "The character buffer used for storage will be null-terminated."

Part of the contract of operator[] is that the caller must not provide an argument greater or equal to the length of the string. The at(size_t pos) member function enforces bounds checking by throwing an exception. The std::string::operator[] can still be used unsafely, but it's possible to at least document a contract, unlike with a pointer dereference operator as in ps[3].

END OF EDIT

But in order to support interoperability with functions that expect a const char* C-string, std::string exposes this character buffer.

Of course, like with std::vector, users might want to modify individual elements (characters) in a string, which is why the string provides operator[].

(In reality, string implementations often have a character buffer of fixed length that they keep internally, and then "reallocate" on the heap if the string's contents exceed that fixed length. This is called the "small string optimization.")

Why is there a data() member function, you may ask, when there is a perfectly serviceable c_str() member function? I think this is there to simplify generic programming: std::array and std::vector also have data() member functions, and std::strings are designed to act like containers.