Bash: Parse multi-line into single-line commands

Just before shellshock, I answered a question on StackOverflow about eliminating comments in bash scripts. My answer used the simple trick of creating a function by enclosing the contents of the script file inside tmp_() { ... }, and then using declare -f tmp_ to pretty-print the function. In the pretty-printed output, there are no comments and lines continued with a backslash-newline have been resolved to single lines. (Except inside backticked command substitution.)

Some other reformatting is also done. For example, compound commands are split into several lines. And some forms of line continuation are not reformatted; for example, a line ending with a pipe symbol is not altered. But it should satisfy the use-case in this question. (See example output below.)

Of course, the function definition needs to be evaluated, which means that the script being pretty-printed might include an injection attack. In the code I suggested, the function definition is evaluated by way of the bash feature which allows functions to be exported and shared with a child process. At the time I wrote this little hack, I believed that mechanism to be safer than calling eval, but as it turns out I was wrong.

Since shellshock, there have been a number of improvements to the code bash uses to import function definitions, closing the door on at least some injection attacks, but there is clearly no assurance that the procedure is completely safe.

If you are going to run the script being analyzed, then using this procedure to pretty-print it probably does not increase your vulnerability; an attacker could simply insert the dangerous code directly in the script and there would be no need to jump through hoops to hide the attack in a way which might bypass the safety checks in the function import code.

All the same, you should think carefully about security issues, both with this little program and with whatever plans you might have to execute arbitrary scripts.

Here is the version of the pretty-printer which works with a post-shellshock-patched bash (and will not work with previous bash versions):

env "BASH_FUNC_tmp_%%=() {
$(<script_name)
}" bash -c 'declare -f tmp_' | tail -n+2

Substitute the name of the file containing the script for script_name, in the second line. You might want to adjust the tail command; it removes the wrapper function name, but does not remove the braces which surround the script body.

The original version, which will work on pre-shellshock versions of bash, can be found in the referenced SO answer.


Sample.

Tested against the input provided by Stéphane Chazelas:

{ 
    echo \\;
    echo a#b;
    echo 'foo\
bar';
    cat  <<EOF
thisis joined
this 'aswell'
$(ls -l)
EOF

    cat  <<'EOF'
this is\
not joined
EOF

    echo "$(ls -l)";
    echo `ls \\
-l`
}

This differs from Stéphane's suggested output:

  • Lines have been indented, and many have been terminated with semicolons. Whitespace has been added and/or deleted in many lines.
  • cat << E\OF has been changed to cat <<'EOF', which is semantically identical.
  • The nested continuation line in the backticked command substitution at the end has not been modified. (The continuation line in the $(...) command substituion is eliminated.)

This works in more cases, now; see if it does what you're expecting:

sed ':loop /^[^#].*[^\\]\\$/N; s/\\\n//; t loop' input

It prints every line by default; if it finds a backslash (escaped because it's a special character) at the end of a line ($) -- and there is not a hash mark at the beginning of the line, then join it with the next line (N) modified by searching & replacing the backslash (escaped again) and newline character with nothing. If the search & replace did something, then go back to the "loop" tag and re-run the search.

Input:

# My comment \
ls \
-al

# leading comment
echo some \
long \
text
# trailing comment

ls -al

Output:

# My comment \
ls -al

# leading comment
echo some long text
# trailing comment

ls -al

This is not really an answer, just a note on the things to consider for a solution to work in the general case.

#! /bin/sh
echo \\
echo a#\
b
echo 'foo\
bar'
cat << EOF
this\
is joined
this 'as\
well'
$(ls \
-l)
EOF
cat << E\OF
this is\
not joined
EOF
echo "$(ls \
-l)"
echo `ls \\
-l`

My understanding of the intent of the question would be that it should be transformed to:

#! /bin/sh
echo \\
echo a#b
echo 'foo\
bar'
cat << EOF
thisis joined
this 'aswell'
$(ls -l)
EOF
cat << E\OF
this is\
not joined
EOF
echo "$(ls -l)"
echo `ls -l`