Security implications of forgetting to quote a variable in bash/POSIX shells

Preamble

First, I'd say it's not the right way to address the problem. It's a bit like saying "you should not murder people because otherwise you'll go to jail".

Similarly, you don't quote your variable because otherwise you're introducing security vulnerabilities. You quote your variables because it is wrong not to (but if the fear of the jail can help, why not).

A little summary for those who've just jumped on the train.

In most shells, leaving a variable expansion unquoted (though that (and the rest of this answer) also applies to command substitution (`...` or $(...)) and arithmetic expansion ($((...)) or $[...])) has a very special meaning. The best way to describe it is that it is like invoking some sort of implicit split+glob operator¹.

cmd $var

in another language would be written something like:

cmd(glob(split($var)))

$var is first split into a list of words according to complex rules involving the $IFS special parameter (the split part) and then each word resulting of that splitting is considered as a pattern which is expanded to a list of files that match it (the glob part).

As an example, if $var contains *.txt,/var/*.xml and $IFS contains ,, cmd would be called with a number of arguments, the first one being cmd and the next ones being the txt files in the current directory and the xml files in /var.

If you wanted to call cmd with just the two literal arguments cmd and *.txt,/var/*.xml, you'd write:

cmd "$var"

which would be in your other more familiar language:

cmd($var)

What do we mean by vulnerability in a shell?

After all, it's been known since the dawn of time that shell scripts should not be used in security-sensitive contexts. Surely, OK, leaving a variable unquoted is a bug but that can't do that much harm, can it?

Well, despite the fact that anybody would tell you that shell scripts should never be used for web CGIs, or that thankfully most systems don't allow setuid/setgid shell scripts nowadays, one thing that shellshock (the remotely exploitable bash bug that made the headlines in September 2014) revealed is that shells are still extensively used where they probably shouldn't: in CGIs, in DHCP client hook scripts, in sudoers commands, invoked by (if not as) setuid commands...

Sometimes unknowingly. For instance system('cmd $PATH_INFO') in a php/perl/python CGI script does invoke a shell to interpret that command line (not to mention the fact that cmd itself may be a shell script and its author may have never expected it to be called from a CGI).

You've got a vulnerability when there's a path for privilege escalation, that is when someone (let's call him the attacker) is able to do something he is not meant to.

Invariably that means the attacker providing data, that data being processed by a privileged user/process which inadvertently does something it shouldn't be doing, in most of the cases because of a bug.

Basically, you've got a problem when your buggy code processes data under the control of the attacker.

Now, it's not always obvious where that data may come from, and it's often hard to tell if your code will ever get to process untrusted data.

As far as variables are concerned, In the case of a CGI script, it's quite obvious, the data are the CGI GET/POST parameters and things like cookies, path, host... parameters.

For a setuid script (running as one user when invoked by another), it's the arguments or environment variables.

Another very common vector is file names. If you're getting a file list from a directory, it's possible that files have been planted there by the attacker.

In that regard, even at the prompt of an interactive shell, you could be vulnerable (when processing files in /tmp or ~/tmp for instance).

Even a ~/.bashrc can be vulnerable (for instance, bash will interpret it when invoked over ssh to run a ForcedCommand like in git server deployments with some variables under the control of the client).

Now, a script may not be called directly to process untrusted data, but it may be called by another command that does. Or your incorrect code may be copy-pasted into scripts that do (by you 3 years down the line or one of your colleagues). One place where it's particularly critical is in answers in Q&A sites as you'll never know where copies of your code may end up.

Down to business; how bad is it?

Leaving a variable (or command substitution) unquoted is by far the number one source of security vulnerabilities associated with shell code. Partly because those bugs often translate to vulnerabilities but also because it's so common to see unquoted variables.

Actually, when looking for vulnerabilities in shell code, the first thing to do is look for unquoted variables. It's easy to spot, often a good candidate, generally easy to track back to attacker-controlled data.

There's an infinite number of ways an unquoted variable can turn into a vulnerability. I'll just give a few common trends here.

Information disclosure

Most people will bump into bugs associated with unquoted variables because of the split part (for instance, it's common for files to have spaces in their names nowadays and space is in the default value of IFS). Many people will overlook the glob part. The glob part is at least as dangerous as the split part.

Globbing done upon unsanitised external input means the attacker can make you read the content of any directory.

In:

echo You entered: $unsanitised_external_input

if $unsanitised_external_input contains /*, that means the attacker can see the content of /. No big deal. It becomes more interesting though with /home/* which gives you a list of user names on the machine, /tmp/*, /home/*/.forward for hints at other dangerous practises, /etc/rc*/* for enabled services... No need to name them individually. A value of /* /*/* /*/*/*... will just list the whole file system.

Denial of service vulnerabilities.

Taking the previous case a bit too far and we've got a DoS.

Actually, any unquoted variable in list context with unsanitized input is at least a DoS vulnerability.

Even expert shell scripters commonly forget to quote things like:

#! /bin/sh -
: ${QUERYSTRING=$1}

: is the no-op command. What could possibly go wrong?

That's meant to assign $1 to $QUERYSTRING if $QUERYSTRING was unset. That's a quick way to make a CGI script callable from the command line as well.

That $QUERYSTRING is still expanded though and because it's not quoted, the split+glob operator is invoked.

Now, there are some globs that are particularly expensive to expand. The /*/*/*/* one is bad enough as it means listing directories up to 4 levels down. In addition to the disk and CPU activity, that means storing tens of thousands of file paths (40k here on a minimal server VM, 10k of which directories).

Now /*/*/*/*/../../../../*/*/*/* means 40k x 10k and /*/*/*/*/../../../../*/*/*/*/../../../../*/*/*/* is enough to bring even the mightiest machine to its knees.

Try it for yourself (though be prepared for your machine to crash or hang):

a='/*/*/*/*/../../../../*/*/*/*/../../../../*/*/*/*' sh -c ': ${a=foo}'

Of course, if the code is:

echo $QUERYSTRING > /some/file

Then you can fill up the disk.

Just do a google search on shell cgi or bash cgi or ksh cgi, and you'll find a few pages that show you how to write CGIs in shells. Notice how half of those that process parameters are vulnerable.

Even David Korn's own one is vulnerable (look at the cookie handling).

up to arbitrary code execution vulnerabilities

Arbitrary code execution is the worst type of vulnerability, since if the attacker can run any command, there's no limit on what he may do.

That's generally the split part that leads to those. That splitting results in several arguments to be passed to commands when only one is expected. While the first of those will be used in the expected context, the others will be in a different context so potentially interpreted differently. Better with an example:

awk -v foo=$external_input '$2 == foo'

Here, the intention was to assign the content of the $external_input shell variable to the foo awk variable.

Now:

$ external_input='x BEGIN{system("uname")}'
$ awk -v foo=$external_input '$2 == foo'
Linux

The second word resulting of the splitting of $external_input is not assigned to foo but considered as awk code (here that executes an arbitrary command: uname).

That's especially a problem for commands that can execute other commands (awk, env, sed (GNU one), perl, find...) especially with the GNU variants (which accept options after arguments). Sometimes, you wouldn't suspect commands to be able to execute others like ksh, bash or zsh's [ or printf...

for file in *; do
  [ -f $file ] || continue
  something-that-would-be-dangerous-if-$file-were-a-directory
done

If we create a directory called x -o yes, then the test becomes positive, because it's a completely different conditional expression we're evaluating.

Worse, if we create a file called x -a a[0$(uname>&2)] -gt 1, with all ksh implementations at least (which includes the sh of most commercial Unices and some BSDs), that executes uname because those shells perform arithmetic evaluation on the numerical comparison operators of the [ command.

$ touch x 'x -a a[0$(uname>&2)] -gt 1'
$ ksh -c 'for f in *; do [ -f $f ]; done'
Linux

Same with bash for a filename like x -a -v a[0$(uname>&2)].

Of course, if they can't get arbitrary execution, the attacker may settle for lesser damage (which may help to get arbitrary execution). Any command that can write files or change permissions, ownership or have any main or side effect could be exploited.

All sorts of things can be done with file names.

$ touch -- '-R ..'
$ for file in *; do [ -f "$file" ] && chmod +w $file; done

And you end up making .. writeable (recursively with GNU chmod).

Scripts doing automatic processing of files in publicly writable areas like /tmp are to be written very carefully.

What about [ $# -gt 1 ]

That's something I find exasperating. Some people go down all the trouble of wondering whether a particular expansion may be problematic to decide if they can omit the quotes.

It's like saying. Hey, it looks like $# cannot be subject to the split+glob operator, let's ask the shell to split+glob it. Or Hey, let's write incorrect code just because the bug is unlikely to be hit.

Now how unlikely is it? OK, $# (or $!, $? or any arithmetic substitution) may only contain digits (or - for some²) so the glob part is out. For the split part to do something though, all we need is for $IFS to contain digits (or -).

With some shells, $IFS may be inherited from the environment, but if the environment is not safe, it's game over anyway.

Now if you write a function like:

my_function() {
  [ $# -eq 2 ] || return
  ...
}

What that means is that the behaviour of your function depends on the context in which it is called. Or in other words, $IFS becomes one of the inputs to it. Strictly speaking, when you write the API documentation for your function, it should be something like:

# my_function
#   inputs:
#     $1: source directory
#     $2: destination directory
#   $IFS: used to split $#, expected not to contain digits...

And code calling your function needs to make sure $IFS doesn't contain digits. All that because you didn't feel like typing those 2 double-quote characters.

Now, for that [ $# -eq 2 ] bug to become a vulnerability, you'd need somehow for the value of $IFS to become under control of the attacker. Conceivably, that would not normally happen unless the attacker managed to exploit another bug.

That's not unheard of though. A common case is when people forget to sanitize data before using it in arithmetic expression. We've already seen above that it can allow arbitrary code execution in some shells, but in all of them, it allows the attacker to give any variable an integer value.

For instance:

n=$(($1 + 1))
if [ $# -gt 2 ]; then
  echo >&2 "Too many arguments"
  exit 1
fi

And with a $1 with value (IFS=-1234567890), that arithmetic evaluation has the side effect of settings IFS and the next [ command fails which means the check for too many args is bypassed.

What about when the split+glob operator is not invoked?

There's another case where quotes are needed around variables and other expansions: when it's used as a pattern.

[[ $a = $b ]]   # a `ksh` construct also supported by `bash`
case $a in ($b) ...; esac

do not test whether $a and $b are the same (except with zsh) but if $a matches the pattern in $b. And you need to quote $b if you want to compare as strings (same thing in "${a#$b}" or "${a%$b}" or "${a##*$b*}" where $b should be quoted if it's not to be taken as a pattern).

What that means is that [[ $a = $b ]] may return true in cases where $a is different from $b (for instance when $a is anything and $b is *) or may return false when they are identical (for instance when both $a and $b are [a]).

Can that make for a security vulnerability? Yes, like any bug. Here, the attacker can alter your script's logical code flow and/or break the assumptions that your script are making. For instance, with a code like:

if [[ $1 = $2 ]]; then
   echo >&2 '$1 and $2 cannot be the same or damage will incur'
   exit 1
fi

The attacker can bypass the check by passing '[a]' '[a]'.

Now, if neither that pattern matching nor the split+glob operator apply, what's the danger of leaving a variable unquoted?

I have to admit that I do write:

a=$b
case $a in...

There, quoting doesn't harm but is not strictly necessary.

However, one side effect of omitting quotes in those cases (for instance in Q&A answers) is that it can send a wrong message to beginners: that it may be all right not to quote variables.

For instance, they may start thinking that if a=$b is OK, then export a=$b would be as well (which it's not in many shells as it's in arguments to the export command so in list context) or env a=$b.

What about zsh?

zsh did fix most of those design awkwardnesses. In zsh (at least when not in sh/ksh emulation mode), if you want splitting, or globbing, or pattern matching, you have to request it explicitly: $=var to split, and $~var to glob or for the content of the variable to be treated as a pattern.

However, splitting (but not globbing) is still done implicitly upon unquoted command substitution (as in echo $(cmd)).

Also, a sometimes unwanted side effect of not quoting variable is the empties removal. The zsh behaviour is similar to what you can achieve in other shells by disabling globbing altogether (with set -f) and splitting (with IFS=''). Still, in:

cmd $var

There will be no split+glob, but if $var is empty, instead of receiving one empty argument, cmd will receive no argument at all.

That can cause bugs (like the obvious [ -n $var ]). That can possibly break a script's expectations and assumptions and cause vulnerabilities.

As the empty variable can cause an argument to be just removed, that means the next argument could be interpreted in the wrong context.

As an example,

printf '[%d] <%s>\n' 1 $attacker_supplied1 2 $attacker_supplied2

If $attacker_supplied1 is empty, then $attacker_supplied2 will be interpreted as an arithmetic expression (for %d) instead of a string (for %s) and any unsanitized data used in an arithmetic expression is a command injection vulnerability in Korn-like shells such as zsh.

$ attacker_supplied1='x y' attacker_supplied2='*'
$ printf '[%d] <%s>\n' 1 $attacker_supplied1 2 $attacker_supplied2
[1] <x y>
[2] <*>

fine, but:

$ attacker_supplied1='' attacker_supplied2='psvar[$(uname>&2)0]'
$ printf '[%d] <%s>\n' 1 $attacker_supplied1 2 $attacker_supplied2
Linux
[1] <2>
[0] <>

The uname arbitrary command was run.

What about when you do need the split+glob operator?

Yes, that's typically when you do want to leave your variable unquoted. But then you need to make sure you tune your split and glob operators correctly before using it. If you only want the split part and not the glob part (which is the case most of the time), then you do need to disable globbing (set -o noglob/set -f) and fix $IFS. Otherwise you'll cause vulnerabilities as well (like David Korn's CGI example mentioned above).

Conclusion

In short, leaving a variable (or command substitution or arithmetic expansion) unquoted in shells can be very dangerous indeed especially when done in the wrong contexts, and it's very hard to know which are those wrong contexts.

That's one of the reasons why it is considered bad practice.

Thanks for reading so far. If it goes over your head, don't worry. One can't expect everyone to understand all the implications of writing their code the way they write it. That's why we have good practice recommendations, so they can be followed without necessarily understanding why.

(and in case that's not obvious yet, please avoid writing security sensitive code in shells).

And please quote your variables on your answers on this site!


¹In ksh93 and pdksh and derivatives, brace expansion is also performed unless globbing is disabled (in the case of ksh93 versions up to ksh93u+, even when the braceexpand option is disabled).

² In ksh93 and yash, arithmetic expansions can also include things like 1,2, 1e+66, inf, nan. There are even more in zsh, including # which is glob operator with extendedglob, but zsh never does split+glob upon arithmetic expansion, even in sh emulation


[Inspired by this answer by cas.]

But what if …?

But what if my script sets a variable to a known value before using it?  In particular, what if it sets a variable to one of two or more possible values (but it always sets it to something known), and none of the values contain space or glob characters?  Isn’t it safe to use it without quotes in that case?

And what if one of the possible values is the empty string, and I’m depending on “empties removal”?  I.e., if the variable contains the empty string, I don’t want to get the empty string in my command; I want to get nothing.  For example,

if some_condition
then
    ignorecase="-i"
else
    ignorecase=""
fi
                                        # Note that the quotes in the above commands are not strictly needed.
grep  $ignorecase  other_grep_args

I can’t say grep "$ignorecase" other_grep_args; that will fail if $ignorecase is the empty string.

Response:

As discussed in the other answer, this will still fail if IFS contains a - or an i.  If you have ensured that IFS doesn’t contain any character in your variable (and you are sure that your variable doesn’t contain any glob characters), then this is probably safe.

But there is a way that is safer (although it’s somewhat ugly and quite unintuitive): use ${ignorecase:+"$ignorecase"}.  From the POSIX Shell Command Language specification, under 2.6.2 Parameter Expansion,

${parameter:+[word]}

    Use Alternative Value.  If parameter is unset or null, null shall be substituted; otherwise, the expansion of word (or an empty string if word is omitted) shall be substituted.

The trick here, such as it is, is that we are using ignorecase as the parameter and "$ignorecase" as the word.  So ${ignorecase:+"$ignorecase"} means

If $ignorecase is unset or null (i.e., empty), null (i.e., unquoted nothing) shall be substituted; otherwise, the expansion of "$ignorecase" shall be substituted.

This gets us where we want to go: if the variable is set to the empty string, it will be “removed” (this entire, convoluted expression will evaluate to nothing — not even an empty string), and if the variable has a non-empty value, we get that value, quoted.


But what if …?

But what if I have a variable that I want/need to be split into words?  (This is otherwise like the first case; my script has set the variable, and I’m sure it doesn’t contain any glob characters.  But it might contain space(s), and I want it split into separate arguments at the space boundaries.
P.S. I still want empties removal.)

For example,

if some_condition
then
    criteria="-type f"
else
    criteria=""
fi
if some_other_condition
then
    criteria="$criteria -mtime +42"
fi
find "$start_directory"  $criteria  other_find_args

Response:

You might think that this is a case for using eval.  No!  Resist the temptation to even think about using eval here.

Again, if you have ensured that IFS doesn’t contain any character in your variable (except for the spaces, which you want to be honored), and you are sure that your variable doesn’t contain any glob characters, then the above is probably safe.

But, if you’re using bash (or ksh, zsh or yash), there is a way that is safer: use an array:

if some_condition
then
    criteria=(-type f)  # You could say `criteria=("-type" "f")`, but it’s really unnecessary.
                        # But do not say `criteria=("-type f")`.
else
    criteria=()         # Do not use any quotes on this command!
fi
if some_other_condition
then
    criteria+=(-mtime +42)      # Note: not `=`, but `+=`, to add (append) to an array.
fi
find "$start_directory"  "${criteria[@]}"  other_find_args

From bash(1),

Any element of an array may be referenced using ${name[subscript]}.  …  If subscript is @ or *, the word expands to all members of name.  These subscripts differ only when the word appears within double quotes.  If the word is double-quoted, … ${name[@]} expands each element of name to a separate word.

So "${criteria[@]}" expands to (in the above example) the zero, two, or four elements of the criteria array, each quoted.  In particular, if neither of the condition s is true, the criteria array has no contents (as set by the criteria=() statement), and "${criteria[@]}" evaluates to nothing (not even an inconvenient empty string).


This gets especially interesting and complicated when you are dealing with multiple words, some of which are dynamic (user) input, which you don’t know in advance, and may contain space(s) or other special character(s).  Consider:

printf "Enter file name to look for: "
read fname
if [ "$fname" != "" ]
then
    criteria+=(-name "$fname")
fi

Note that $fname is quoted each time it is used.  This works even if the user enters something like foo bar or foo*"${criteria[@]}" evaluates to -name "foo bar" or -name "foo*".  (Remember that each element of the array is quoted.)

Arrays don’t work in all POSIX shells; arrays are a ksh / bash / zsh / yash-ism.  Except … there’s one array that all shells support: the argument list, a.k.a. "$@".  If you are done with the argument list that you were invoked with (e.g., you’ve copied all the “positional parameters”  (arguments) into variables, or otherwise processed them), you can use the arg list as an array:

if some_condition
then
    set -- -type f      # You could say `set -- "-type" "f"`, but it’s really unnecessary.
else
    set --
fi
if some_other_condition
then
    set -- "$@" -mtime +42
fi
# Similarly:    set -- "$@" -name "$fname"
find "$start_directory"  "$@"  other_find_args

The "$@" construct (which, historically, came first) has the same semantics as "${name[@]}" — it expands each argument (i.e., each element of the argument list) to a separate word, as if you had typed "$1" "$2" "$3" ….

Excerpting from the POSIX Shell Command Language specification, under 2.5.2 Special Parameters,

@

    Expands to the positional parameters, starting from one, initially producing one field for each positional parameter that is set.  …, the initial fields shall be retained as separate fields, ….  If there are no positional parameters, the expansion of @ shall generate zero fields, even when @ is within double-quotes; …

The full text is somewhat cryptic; the key point is that it specifies that "$@" shall generate zero fields when there are no positional parameters.  Historical note: when "$@" was first introduced in the Bourne shell (predecessor to bash) in 1979, it had a bug that "$@" was replaced by a single empty string when there were no positional parameters; see What does ${1+"$@"} mean in a shell script, and how does it differ from "$@"?,  The Traditional Bourne Shell Family,  What does ${1+"$@"} mean ...and where is it necessary?,  and "$@" versus ${1+"$@"}.


Arrays help with the first situation, too:

if some_condition
then
    ignorecase=(-i)     # You could say `ignorecase=("-i")`, but it’s really unnecessary.
else
    ignorecase=()       # Do not use any quotes on this command!
fi
grep  "${ignorecase[@]}"  other_grep_args

____________________

P.S. (csh)

This should go without saying, but, for the benefit of folks who’re new here: csh, tcsh, etc., are not Bourne/POSIX shells.  They’re a whole different family.  A horse of a different color.  A whole other ball game.  A different breed of cat.  Birds of another feather.  And, most particularly, a different can of worms.

Some of what’s been said on this page applies to csh; such as: it’s a good idea to quote all your variables unless you have a good reason not to, and you’re sure you know what you’re doing.  But, in csh, every variable is an array — it just so happens that almost every variable is an array of only one element, and acts pretty similar to an ordinary shell variable in Bourne/POSIX shells.  And the syntax is awfully different (and I do mean awfully).  So we won't say anything more about csh-family shells here.


I was skeptical of Stéphane’s answer, however it is possible to abuse $#:

$ set `seq 101`

$ IFS=0

$ echo $#
1 1

or $?:

$ IFS=0

$ awk 'BEGIN {exit 101}'

$ echo $?
1 1

These are contrived examples, but the potential does exist.