md5sum prepends '\' to the checksum

This is documented, for Coreutils’ md5sum:

If file contains a backslash or newline, the line is started with a backslash, and each problematic character in the file name is escaped with a backslash, making the output unambiguous even in the presence of arbitrary file names.

(file is the filename, not the file’s contents).

b2sum, sha1sum, and the various SHA-2 tools behave in the same way as md5sum. sum and cksum don’t; sum is only provided for backwards-compatibility (and its ancestors don’t produce quoted output), and cksum is specified by POSIX and doesn’t allow this type of output.

This behaviour was introduced in November 2015 and released in version 8.25 (January 2016), with the following NEWS entry:

md5sum now ensures a single line per file for status on standard output, by using a '\' at the start of the line, and replacing any newlines with '\n'. This also affects sha1sum, sha224sum, sha256sum, sha384sum and sha512sum.

The backslash at the start of the line serves as a flag: escapes in filenames are only processed if the line starts with a backslash. (Unescaping can’t be the default behaviour: it would break sums generated with older versions of Coreutils containing \\ or \n in the stored filenames.)


Stephen Kitt's answer covers the what and I will try to cover why this change was implemented. First, someone observed that a filename containing newlines1 could result in ambiguous output. For example, consider this output:

d41d8cd98f00b204e9800998ecf8427e  foo
25af89c92254a806b2e93fffd8ac1814  bar

Does this mean there were two files foo and bar, or only one file whose filename is "foo\n25af89c92254a806b2e93fffd8ac1814 bar"? Granted, this latter possibility is highly unlikely, but it is possible. To resolve the ambiguity the developers chose to escape newlines with a backslash (\). The output then becomes distinguishable. However, then there is a further ambiguity:

764efa883dda1e11db47671c4a3bbd9e  foo\nbar

Does this file's name contain a newline, or a backslash followed by an n? To resolve this we need to escape backslashes too, so that the latter case becomes:

764efa883dda1e11db47671c4a3bbd9e  foo\\nbar

Finally, they elected to prepend each output line which contains such escapes with a \\ to make it easy for a parser to detect whether escaping has been done. Presumably this was done to allow parsers to handle output both from escaping versions of md5sum and from non-escaping versions (non-GNU). The flag also means that "costly" un-escaping does not need to be done when not necessary. You can see an example of this parsing in action in md5sum.c itself (line 382 in the linked version).


1 By newline I mean the character \n which is sometimes also specifically referred to as a linefeed or LF; see md5sum.c.