Why must C/C++ string literal declarations be single-line?

One has to consider that C was not written to be an "Applications" programming language but a systems programming language. It would not be inaccurate to say it was designed expressly to rewrite Unix. With that in mind, there was no EMACS or VIM and your user interfaces were serial terminals. Multiline string declarations would seem a bit pointless on a system that did not have a multiline text editor. Furthermore, string manipulation would not be a primary concern for someone looking to write an OS at that particular point in time. The traditional set of UNIX scripting tools such as AWK and SED (amongst MANY others) are a testament to the fact they weren't using C to do significant string manipulation.

Additional considerations: it was not uncommon in the early 70s (when C was written) to submit your programs on PUNCH CARDS and come back the next day to get them. Would it have eaten up extra processing time to compile a program with multiline strings literals? Not really. It can actually be less work for the compiler. But you were going to come back for it the next day anyhow in most cases. But nobody who was filling out a punch card was going to put large amounts of text that wasn't needed in their programs.

In a modern environment, there is probably no reason not to include multiline string literals other than designer's preference. Grammatically speaking, it's probably simpler because you don't have to take linefeeds into consideration when parsing the string literal.

In addition to the existing answers, you can work around this using C++11's raw string literals, e.g.:

#include <iostream>
#include <string>

int main() {
   std::string str = R"(a
b)";
   std::cout << str;
}

/* Output:
a
b
*/

Live demo.

[n3290: 2.14.5/4]: [ Note: A source-file new-line in a raw string literal results in a new-line in the resulting execution string-literal. Assuming no whitespace at the beginning of lines in the following example, the assert will succeed:
const char *p = R"(a\
b
c)";
assert(std::strcmp(p, "a\\\nb\nc") == 0);
—end note ]

Though non-normative, this note and the example that follows it in [n3290: 2.14.5/5] serve to complement the indication in the grammar that the production r-char-sequence may contain newlines (whereas the production s-char-sequence, used for normal string literals, may not).

The terse answer is "because the grammar prohibits multiline string literals." I don't know whether there is a good reason for this other than historical reasons.

There are, of course, ways around this. You can use line splicing:

const char* script = "\
      Some\n\
   Formatted\n\
 String Literal\n\
";

If the \ appears as the last character on the line, the newline will be removed during preprocessing.

Or, you can use string literal concatenation:

const char* script = 
"      Some\n"
"   Formatted\n"
" String Literal\n";

Adjacent string literals are concatenated during preprocessing, so these will end up as a single string literal at compile-time.

Using either technique, the string literal ends up as if it were written:

const char* script = "      Some\n   Formatted\n  String Literal\n";

Why must C/C++ string literal declarations be single-line?

Live demo.

Tags:

C++

C

String

Language Design

Programming Languages

Related

Recent Posts