What's the point in adding a new line to the end of a file?

It's not about adding an extra newline at the end of a file, it's about not removing the newline that should be there.

A text file, under unix, consists of a series of lines, each of which ends with a newline character (\n). A file that is not empty and does not end with a newline is therefore not a text file.

Utilities that are supposed to operate on text files may not cope well with files that don't end with a newline; historical Unix utilities might ignore the text after the last newline, for example. GNU utilities have a policy of behaving decently with non-text files, and so do most other modern utilities, but you may still encounter odd behavior with files that are missing a final newline¹.

With GNU diff, if one of the files being compared ends with a newline but not the other, it is careful to note that fact. Since diff is line-oriented, it can't indicate this by storing a newline for one of the files but not for the others — the newlines are necessary to indicate where each line in the diff file starts and ends. So diff uses this special text \ No newline at end of file to differentiate a file that didn't end in a newline from a file that did.

By the way, in a C context, a source file similarly consists of a series of lines. More precisely, a translation unit is viewed in an implementation-defined as a series of lines, each of which must end with a newline character (n1256 §5.1.1.1). On unix systems, the mapping is straightforward. On DOS and Windows, each CR LF sequence (\r\n) is mapped to a newline (\n; this is what always happens when reading a file opened as text on these OSes). There are a few OSes out there which don't have a newline character, but instead have fixed- or variable-sized records; on these systems, the mapping from files to C source introduces a \n at the end of each record. While this isn't directly relevant to unix, it does mean that if you copy a C source file that's missing its final newline to a system with record-based text files, then copy it back, you'll either end up with the incomplete last line truncated in the initial conversion, or an extra newline tacked onto it during the reverse conversion.

¹ Example: the output of GNU sort always ends with a newline. So if the file foo is missing its final newline, you'll find that sort foo | wc -c reports one more character than cat foo | wc -c.


Not necessarily the reason, but a practical consequence of files not ending with a new line:

Consider what would happen if you wanted to process several files using cat. For instance, if you wanted to find the word foo at the start of the line across 3 files:

cat file1 file2 file3 | grep -e '^foo'

If the first line in file3 starts with foo, but file2 does not have a final \n after its last line, this occurrence would not be found by grep, because the last line in file2 and the first line in file3 would be seen by grep as a single line.

So, for consistence and in order to avoid surprises I try to keep my files always ending with a new line.


There are two aspects:

  1. There are/were some C compilers that cannot parse the last line if it does not end with a newline. The C standard specifies that a C file should end with a newline (C11, 5.1.1.2, 2.) and that a last line without a newline yields undefined behavior (C11, J.2, 2nd item). Perhaps for historic reasons, because some vendor of such a compiler was part of the committee when the first standard was written. Thus the warning by GCC.

  2. diff programs (like used by git diff, github etc.) show line by line differences between files. They usually print a message when only one file ends with a newline because else you would not see this difference. For example if the only difference between two files is the presence of the last newline character, without the hint it would look like the both files were the same, when diff and cmp return an exit-code unequal success and the checksums of the files (e.g. via md5sum) don't match.

Tags:

Files

Newlines