How should character arrays be used as strings?

From the C Standard (7.1.1 Definitions of terms)

1 A string is a contiguous sequence of characters terminated by and including the first null character. The term multibyte string is sometimes used instead to emphasize special processing given to multibyte characters contained in the string or to avoid confusion with a wide string. A pointer to a string is a pointer to its initial (lowest addressed) character. The length of a string is the number of bytes preceding the null character and the value of a string is the sequence of the values of the contained characters, in order.

In this declaration

char str [5] = "hello";

the string literal "hello" has the internal representation like

{ 'h', 'e', 'l', 'l', 'o', '\0' }

so it has 6 characters including the terminating zero. Its elements are used to initialize the character array str which reserve space only for 5 characters.

The C Standard (opposite to the C++ Standard) allows such an initialization of a character array when the terminating zero of a string literal is not used as an initializer.

However as a result the character array str does not contain a string.

If you want that the array would contain a string you could write

char str [6] = "hello";

or just

char str [] = "hello";

In the last case the size of the character array is determined from the number of initializers of the string literal that is equal to 6.


A C string is a character array that ends with a null terminator.

All characters have a symbol table value. The null terminator is the symbol value 0 (zero). It is used to mark the end of a string. This is necessary since the size of the string isn't stored anywhere.

Therefore, every time you allocate room for a string, you must include sufficient space for the null terminator character. Your example does not do this, it only allocates room for the 5 characters of "hello". Correct code should be:

char str[6] = "hello";

Or equivalently, you can write self-documenting code for 5 characters plus 1 null terminator:

char str[5+1] = "hello";

But you can also use this and let the compiler do the counting and pick the size:

char str[] = "hello"; // Will allocate 6 bytes automatically

When allocating memory for a string dynamically in run-time, you also need to allocate room for the null terminator:

char input[n] = ... ;
...
char* str = malloc(strlen(input) + 1);

If you don't append a null terminator at the end of a string, then library functions expecting a string won't work properly and you will get "undefined behavior" bugs such as garbage output or program crashes.

The most common way to write a null terminator character in C is by using a so-called "octal escape sequence", looking like this: '\0'. This is 100% equivalent to writing 0, but the \ serves as self-documenting code to state that the zero is explicitly meant to be a null terminator. Code such as if(str[i] == '\0') will check if the specific character is the null terminator.

Please note that the term null terminator has nothing to do with null pointers or the NULL macro! This can be confusing - very similar names but very different meanings. This is why the null terminator is sometimes referred to as NUL with one L, not to be confused with NULL or null pointers. See answers to this SO question for further details.

The "hello" in your code is called a string literal. This is to be regarded as a read-only string. The "" syntax means that the compiler will append a null terminator in the end of the string literal automatically. So if you print out sizeof("hello") you will get 6, not 5, because you get the size of the array including a null terminator.


It compiles cleanly with gcc

Indeed, not even a warning. This is because of a subtle detail/flaw in the C language that allows character arrays to be initialized with a string literal that contains exactly as many characters as there is room in the array and then silently discard the null terminator (C17 6.7.9/15). The language is purposely behaving like this for historical reasons, see Inconsistent gcc diagnostic for string initialization for details. Also note that C++ is different here and does not allow this trick/flaw to be used.


Can all strings be considered an array of characters (Yes), can all character arrays be considered strings (No).

Why Not? and Why does it matter?

In addition to the other answers explaining that the length of a string is not stored anywhere as part of the string and the references to the standard where a string is defined, the flip-side is "How do the C library functions handle strings?"

While a character array can hold the same characters, it is simply an array of characters unless the last character is followed by the nul-terminating character. That nul-terminating character is what allows the array of characters to be considered (handled as) a string.

All functions in C that expect a string as an argument expect the sequence of characters to be nul-terminated. Why?

It has to do with the way all string functions work. Since the length isn't included as part of an array, string-functions, scan forward in the array until the nul-character (e.g. '\0' -- equivalent to decimal 0) is found. See ASCII Table and Description. Regardless whether you are using strcpy, strchr, strcspn, etc.. All string functions rely on the nul-terminating character being present to define where the end of that string is.

A comparison of two similar functions from string.h will emphasize the importance of the nul-terminating character. Take for example:

    char *strcpy(char *dest, const char *src);

The strcpy function simply copies bytes from src to dest until the nul-terminating character is found telling strcpy where to stop copying characters. Now take the similar function memcpy:

    void *memcpy(void *dest, const void *src, size_t n);

The function performs a similar operation, but does not consider or require the src parameter to be a string. Since memcpy cannot simply scan forward in src copying bytes to dest until a nul-terminating character is reached, it requires an explicit number of bytes to copy as a third parameter. This third parameter provides memcpy with the same size information strcpy is able to derive simply by scanning forward until a nul-terminating character is found.

(which also emphasizes what goes wrong in strcpy (or any function expecting a string) if you fail to provide the function with a nul-terminated string -- it has no idea where to stop and will happily race off across the rest of your memory segment invoking Undefined Behavior until a nul-character just happens to be found somewhere in memory -- or a Segmentation Fault occurs)

That is why functions expecting a nul-terminated string must be passed a nul-terminated string and why it matters.