Are C constant character strings always null terminated?

A string is only a string if it contains a null character.

A string is a contiguous sequence of characters terminated by and including the first null character. C11 §7.1.1 1

"abc" is a string literal. It also always contains a null character. A string literal may contain more than 1 null character.

"def\0ghi"  // 2 null characters.

In the following, though, x is not a string (it is an array of char without a null character). y and z are both arrays of char and both are strings.

char x[3] = "abc";
char y[4] = "abc";
char z[] = "abc";

With OP's code, s points to a string, the string literal "abc", *(s + 3) and s[3] have the value of 0. To attempt to modified s[3] is undefined behavior as 1) s is a const char * and 2) the data pointed to by s is a string literal. Attempting to modify a string literal is also undefined behavior.

const char* s = "abc";

Deeper: C does not define "constant character strings".

The language defines a string literal, like "abc" to be a character array of size 4 with the value of 'a', 'b', 'c', '\0'. Attempting to modify these is UB. How this is used depends on context.

The standard C library defines string.

With const char* s = "abc";, s is a pointer to data of type char. As a const some_type * pointer, using s to modify data is UB. s is initialized to point to the string literal "abc". s itself is not a string. The memory s initial points to is a string.


In short, yes. A string constant is of course a string and a string is by definition 0-terminated.

If you use a string constant as an array initializer like this:

char x[5] = "hello";

you won't have a 0 terminator in x simply because there's no room for it.

But with

char x[] = "hello";

it will be there and the size of x is 6.


In C, there isn't really a "string" datatype like in C++ and Java.

Important principle that every competent computer science degree program should mention: Information is symbols plus interpretation.

A "string" is defined conventionally as any sequence of characters ending in a null byte ('\0').

The "gotcha" that's being posted (character/byte arrays with the value 0 in the middle of them) is only a difference of interpretation. Treating a byte array as a string versus treating it as bytes (numbers in [0, 255]) has different applications. Obviously if you're printing to the terminal you might want to print characters until you reach a null byte. If you're saving a file or running an encryption algorithm on blocks of data you will need to support 0's in byte arrays.

It's also valid to take a "string" and optionally interpret as a byte array.


The notion of a string is determinate as a sequence of characters terminated by zero character. It is not important whether the sequence is modifiable or not that is whether a corresponding declaration has the qualifier const or not.

For example string literals in C have types of non-constant character arrays. So you may write for example

char *s = "Hello world";

In this declaration the identifier s points to the first character of the string.

You can initialize a character array yourself by a string using a string literal. For example

char s[] = "Hello world";

This declaration is equivalent to

char s[] = { 'H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '\0' };

However in C you may exclude the terminating zero from an initialization of a character array.

For example

char s[11] = "Hello world"; 

Though the string literal used as the initializer contains the terminating zero it is excluded from the initialization. As result the character array s does not contain a string.

Tags:

C

String