shell: keep trailing newlines ('\n') in command substitution

POSIX shells

The usual (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ) trick to get the complete stdout of a command is to do:

output=$(cmd; ret=$?; echo .; exit "$ret")
ret=$?
output=${output%.}

The idea is to add an extra .\n. Command substitution will only strip that \n. And you strip the . with ${output%.}.

Note that in shells other than zsh, that will still not work if the output has NUL bytes. With yash, that won't work if the output is not text.

Also note that in some locales, it matters what character you use to insert at the end. . should generally be fine, but some other might not. For instance x (as used in some other answers) or @ would not work in a locale using the BIG5, GB18030 or BIG5HKSCS charsets. In those charsets, the encoding of a number of characters ends in the same byte as the encoding of x or @ (0x78, 0x40)

For instance, ū in BIG5HKSCS is 0x88 0x78 (and x is 0x78 like in ASCII, all charsets on a system must have the same encoding for all the characters of the portable character set which includes English letters, @ and .). So if cmd was printf '\x88' and we inserted x after it, ${output%x} would fail to strip that x as $output would actually contain ū.

Using . instead could lead to the same problem in theory if there were any characters whose encoding ends in the same encoding as ., but for having checked some time ago, I can tell that none of the charsets that may be available for use in a locale in a Debian, FreeBSD or Solaris systems have such characters which is good enough for me (and why I've settled on . which is also the symbol to mark the end of a sentence in English so seems appropriate).

A more correct approach as discussed by @Isaac would be to change the locale to C only for the stripping of the last character (${output%.}) which would make sure only one byte is stripped, but that would complicate the code significantly and potentially introduce compatibility issues of its own.

bash/zsh alternatives

With bash and zsh, assuming the output has no NULs, you can also do:

IFS= read -rd '' output < <(cmd)

To get the exit status of cmd, you can do wait "$!"; ret=$? in bash but not in zsh.

rc/es/akanaga

For completeness, note that rc/es/akanga have an operator for that. In them, command substitution, expressed as `cmd (or `{cmd} for more complex commands) returns a list (by splitting on $ifs, space-tab-newline by default). In those shells (as opposed to Bourne-like shells), the stripping of newline is only done as part of that $ifs splitting. So you can either empty $ifs or use the ``(seps){cmd} form where you specify the separators:

ifs = ''; output = `cmd

or:

output = ``()cmd

In any case, the exit status of the command is lost. You'd need to embed it in the output and extract it afterwards which would become ugly.

fish

In fish, command substitution is with (cmd) and doesn't involve a subshell.

set var (cmd)

Creates a $var array with all the lines in the output of cmd if $IFS is non-empty, or with the output of cmd stripped of up to one (as opposed to all in most other shells) newline character if $IFS is empty.

So there's still an issue in that (printf 'a\nb') and (printf 'a\nb\n') expand to the same thing even with an empty $IFS.

To work around that, the best I could come up with was:

function exact_output
  set -l IFS . # non-empty IFS
  set -l ret
  set -l lines (
    cmd
    set ret $status
    echo
  )
  set -g output ''
  set -l line
  test (count $lines) -le 1; or for line in $lines[1..-2]
    set output $output$line\n
  end
  set output $output$lines[-1]
  return $ret
end

An alternative is to do:

read -z output < (begin; cmd; set ret $status; end | psub)

Bourne shell

The Bourne shell did not support the $(...) form nor the ${var%pattern} operator, so it can be quite hard to achieve there. One approach is to use eval and quoting:

eval "
  output='`
    exec 4>&1
    ret=\`
      exec 3>&1 >&4 4>&-
      (cmd 3>&-; echo \"\$?\" >&3; printf \"'\") |
        awk 3>&- -v RS=\\\\' -v ORS= -v b='\\\\\\\\' '
          NR > 1 {print RS b RS RS}; {print}; END {print RS}'
    \`
    echo \";ret=\$ret\"
  `"

Here, we're generating a

output='output of cmd
with the single quotes escaped as '\''
';ret=X

to be passed to eval. As for the POSIX approach, if ' was one of those characters whose encoding can be found at the end of other characters, we'd have a problem (a much worse one as it would become a command injection vulnerability), but thankfully, like ., it's not one of those, and that quoting technique is generally the one that is used by anything that quotes shell code (note that \ has the issue, so shouldn't be used (also excludes "..." inside which you need to use backslashes for some characters). Here, we're only using it after a ' which is OK).

tcsh

See tcsh preserve newlines in command substitution `...`

(not taking care of the exit status, which you could address by saving it in a temporary file (echo $status > $tempfile:q after the command))


For the new question, this script works:

#!/bin/bash

f()           { for i in $(seq "$((RANDOM % 3 ))"); do
                    echo;
                done; return $((RANDOM % 256));
              }

exact_output(){ out=$( $1; ret=$?; echo x; exit "$ret" );
                unset OldLC_ALL ; [ "${LC_ALL+set}" ] && OldLC_ALL=$LC_ALL
                LC_ALL=C ; out=${out%x};
                unset LC_ALL ; [ "${OldLC_ALL+set}" ] && LC_ALL=$OldLC_ALL
                 printf 'Output:%10q\nExit :%2s\n' "${out}" "$?"
               }

exact_output f
echo Done

On execution:

Output:$'\n\n\n'
Exit :25
Done

The longer description

The usual wisdom for POSIX shells to deal with the removal of \n is:

add an x

s=$(printf "%s" "${1}x"); s=${s%?}

That is required because the last new line(S) are removed by the command expansion per POSIX specification:

removing sequences of one or more characters at the end of the substitution.


About a trailing x.

It has been said in this question that an x could be confused with the trailing byte of some character in some encoding. But how are we going to guess what or which character is better in some language in some possible encoding, that is a difficult proposition, to say the least.

However; That is simply incorrect.

The only rule that we need to follow is to add exactly what we remove.

It should be easy to understand that if we add something to an existing string (or byte sequence) and later we remove exactly the same something, the original string (or byte sequence) must be the same.

Where do we go wrong? When we mix characters and bytes.

If we add a byte, we must remove a byte, if we add a character we must remove the exact same character.

The second option, adding a character (and later removing the exact same character) may become convoluted and complex, and, yes, code pages and encodings may get in the way.

However, the first option is quite possible, and, after explaining it, it will become plain simple.

Lets add a byte, an ASCII byte (<127), and to keep things as less convoluted as possible, let's say an ASCII character in the range of a-z. Or as we should be saying it, a byte in the hex range 0x61 - 0x7a. Lets choose any of those, maybe an x (really a byte of value 0x78). We can add such byte with by concatenating an x to an string (lets assume an é):

$ a=é
$ b=${a}x

If we look at the string as a sequence of bytes, we see:

$ printf '%s' "$b" | od -vAn -tx1c
  c3  a9  78
 303 251   x

An string sequence that ends in an x.

If we remove that x (byte value 0x78), we get:

$ printf '%s' "${b%x}" | od -vAn -tx1c
  c3  a9
 303 251

It works without a problem.

A little more difficult example.

Lets say that the string we are interested in ends in byte 0xc3:

$ a=$'\x61\x20\x74\x65\x73\x74\x20\x73\x74\x72\x69\x6e\x67\x20\xc3'

And lets add a byte of value 0xa9

$ b=$a$'\xa9'

The string has become this now:

$ echo "$b"
a test string é

Exactly what I wanted, the last two bytes are one character in utf8 (so anyone could reproduce this results in their utf8 console).

If we remove a character, the original string will be changed. But that is not what we added, we added a byte value, which happens to be written as an x, but a byte anyway.

What we need to avoid misinterpreting bytes as characters. What we need is an action that removes the byte we used 0xa9. In fact, ash, bash, lksh and mksh all seem to do exactly that:

$ c=$'\xa9'
$ echo ${b%$c} | od -vAn -tx1c
 61  20  74  65  73  74  20  73  74  72  69  6e  67  20  c3  0a
  a       t   e   s   t       s   t   r   i   n   g     303  \n

But not ksh or zsh.

However, that is very easy to solve, lets tell all those shells to do byte removal:

$ LC_ALL=C; echo ${b%$c} | od -vAn -tx1c 

that's it, all shells tested work (except yash) (for the last part of the string):

ash             :    s   t   r   i   n   g     303  \n
dash            :    s   t   r   i   n   g     303  \n
zsh/sh          :    s   t   r   i   n   g     303  \n
b203sh          :    s   t   r   i   n   g     303  \n
b204sh          :    s   t   r   i   n   g     303  \n
b205sh          :    s   t   r   i   n   g     303  \n
b30sh           :    s   t   r   i   n   g     303  \n
b32sh           :    s   t   r   i   n   g     303  \n
b41sh           :    s   t   r   i   n   g     303  \n
b42sh           :    s   t   r   i   n   g     303  \n
b43sh           :    s   t   r   i   n   g     303  \n
b44sh           :    s   t   r   i   n   g     303  \n
lksh            :    s   t   r   i   n   g     303  \n
mksh            :    s   t   r   i   n   g     303  \n
ksh93           :    s   t   r   i   n   g     303  \n
attsh           :    s   t   r   i   n   g     303  \n
zsh/ksh         :    s   t   r   i   n   g     303  \n
zsh             :    s   t   r   i   n   g     303  \n

Just that simple, tell the shell to remove a LC_ALL=C character,which is exactly one byte for all byte values from 0x00 to 0xff.

Solution for comments:

For the example discussed in the comments, one possible solution (which fails in zsh) is:

#!/bin/bash

LC_ALL=zh_HK.big5hkscs

a=$(printf '\210\170');
b=$(printf '\170');

unset OldLC_ALL ; [ "${LC_ALL+set}" ] && OldLC_ALL=$LC_ALL
LC_ALL=C ; a=${a%"$b"};
unset LC_ALL ; [ "${OldLC_ALL+set}" ] && LC_ALL=$OldLC_ALL

printf '%s' "$a" | od -vAn -c

That will remove the problem of encoding.


You can output a character after the normal output and then strip it:

#capture the output of "$@" (arguments run as a command)
#into the exact_output` variable
exact_output() 
{
    exact_output=$( "$@" && printf X ) && 
    exact_output=${exact_output%X}
}

This is a POSIX compliant solution.

Tags:

Shell

Bash