unexpected behaviour in shell command substitution

Your command substitution will generate a string.

In the case of

$(echo '"hello " "hi and bye"')

this string will be "hello " "hi and bye".

The string is then undergoing word splitting (and filename globbing, but it doesn't affect this example). The word splitting happen on every character that is the same as one of the characters in $IFS (by default, spaces, tabs and newlines).

The words generated by the default value of IFS would be "hello, ", "hi, and, and bye".

These are then given as separate arguments to your script.

In your second command, the command substitution is

$(echo "hello " "hi and bye")

This generates the string hello hi and bye and the word splitting would result in the four words hello, hi, and, and bye.

In your last example, you use the two arguments hello and hi and bye directly with your script. These won't undergo word splitting because they are quoted.


TL;DR: the result of echo is a victim of word splitting on $() result, and sh test.sh "hello " "hi and bye" uses a different rule.

Let's take a peak as what's actually happening, when you add set -x for debugging:

> set -x
> ./main.sh $(echo '"hello " "hi and bye"')
++ echo '"hello " "hi and bye"'
+ ./main.sh '"hello' '"' '"hi' and 'bye"'
"hello
"
"hi
and
bye"

echo receives "hello " "hi and bye" as a single argument (yep, including all spaces and single quote) due to single-quotes in the command itself. But command substitution turns that into "hello " "hi and bye". To quote bash manual:

If the substitution appears within double quotes, word splitting and pathname expansion are not performed on the results.

However, in your case, unquoted command substitution allows exactly for word splitting to occur based on blank character (which is one of the 3 characters set in the $IFS variable, which is what is used for word splitting). Thus, if you break the resulting string at each space, what do you have?

  1. "hello, then space, after which we have next token
  2. " - that one character by itself, isolated by spaces from both sides.
  3. "hi, again with space on left and right
  4. and
  5. bye"

all in all, you get 5 tokens broken at whitespaces. It's important to realize that word splitting, in this case, occurs exactly a result of command substitution. Later, all of these are treated as individual tokens that have been already parsed. As you see in set -x output, the shell script is called with exactly those tokens.

As for why sh test.sh "hello " "hi and bye" behaves differently, that's because the different rule applies. Your command-line is parsed after you enter it, and quoted strings are treated as a single unit, i.e. hello and hi and bye


You could change IFS to something other than space (since you want to protect the space in hi and bye), and use that something to separate the two strings:

$ IFS=:
$ sh test.sh $(echo "hello ":"hi and bye")
hello 
hi and bye

The order of operations (those relevant in this example, anyway) is somewhat like this:

  1. substitutions
  2. word splitting
  3. quote removal

Note that quotes created as a result of substitutions aren't special in word splitting or quote removal, so a quote in the output of (1) passes through (2) and (3) like any other character. So:

  1. In echo "hello " "hi and bye", the shell removes these quotes, so echo gets hello and hi and bye, so outputs hello hi and bye, joining the strings with a space.
  2. In echo '"hello " "hi and bye"', the shell removes the outer ', the echo gets "hello " "hi and bye" and outputs that.
  3. In sh test.sh $(echo '"hello " "hi and bye"'), the command substitution is replaced with "hello " "hi and bye", but these quotes are a result of a substitution, and so aren't involved in word splitting or quote removal.

    So the shell splits those into "hello, ", "hi, and, bye", hence the output you get.

  4. In sh test.sh $(echo "hello " "hi and bye"), the command substitution is replaced with hello hi and bye, which gets split into hello, hi and bye.

Tags:

Shell