What's the purpose of adding a prefix on both sides of a shell variable comparison to a string literal?

The important thing to understand here is that in most shells¹, [ is just an ordinary command parsed by the shell like any other ordinary command.

Then the shell invokes that [ (aka test) command with a list of arguments, and then it's up to [ to interpret them as a conditional expression.

At that point, those are just a list of strings and the information about which ones resulted from some form of expansion is lost, even in those shells where [ is built-in (all Bourne-like ones these days).

The [ utility used to have a hard time telling which ones of its arguments were operators and which ones were operands (the thing operators work on). It didn't help that the syntax was intrinsically ambiguous. For instance:

[ -t ] used to be (and still is in some shells/[s) to test whether stdout is a terminal.
[ x ] is short for [ -n x ]: test whether x is a non-empty string (so you can see there's a conflict with the above).
in some shells/[s, -a and -o can be both unary ([ -a file ] for accessible file (now replaced by [ -e file ]), [ -o option ] for is the option enabled?) and binary operators (and and or). Again, ! -a x can be either and(nonempty("!"), nonempty("x")) or not(isaccessible("x")).
(, ) and ! add more problems.

In normal programming languages like C or perl, in:

if ($a eq $b) {...}

There's no way the content of $a or $b will be taken as operators because the conditional expression is parsed before those $a and $b are expanded. But in shells, in:

[ "$a" = "$b" ]

The shell expands the variables first². For instance, if $a contains ( and $b contains ), all the [ command sees is [, (, =, ) and ] arguments. So does that means "(" = ")" (are ( and ) lexically equal) or ( -n = ) (is = a non-empty string).

Historical implementations (test appeared in Unix V7 in the late 70s) used to fail even in cases where it was not ambiguous just because of the order in which they were processing their arguments.

Here with version 7 Unix in a PDP11 emulator:

$ ls -l /bin/[
-rwxr-xr-x 2 bin      2876 Jun  8  1979 /bin/[
$ [ ! = x ]
test: argument expected
$ [ "(" = x ]
test: argument expected

Most shell and [ implementations have or have had problems with those or variants thereof. With bash 4.4 today:

bash-4.4$ a='(' b=-o c=x
bash-4.4$ [ "$a" = "$b" -o "$a" = "$c" ]
bash: [: `)' expected, found =

POSIX.2 (published in the early 90s) devised an algorithm that would make ['s behaviour unambiguous and deterministic when passed at most 4 arguments (beside [ and ]) in the most common usage patterns ([ -f "$a" -o "$b" ] still unspecified for instance). It deprecated (, ), -a and -o, and dropped -t without operand. bash did implement that algorithm (or at least tried to) in bash 2.0.

So, in POSIX compliant [ implementations, [ "$a" = "$b" ] is guaranteed to compare the content of $a and $b for equality, whatever they are. Without -o, we would write:

[ "$a" = "$b" ] || [ "$a" = "$c" ]

That is, call [ twice, each time with fewer than 5 arguments.

But it took quite a while for all [ implementations to become compliant. bash's was not compliant until 4.4 (though the last problem was for [ '(' ! "$var" ')' ] which nobody would really use in real life)

The /bin/sh of Solaris 10 and older, which is not a POSIX shell, but a Bourne shell still has problems with [ "$a" = "$b" ]:

$ a='!' b='!'
$ [ "$a" = "$b" ]
test: argument expected

Using [ "x$a" = "x$b" ] works around the problem as there is no [ operator that starts with x. Another option is to use case instead:

case "$a" in
  "$b") echo same;;
     *) echo different;;
esac

(quoting is necessary around $b, not around $a).

In any case, it is not and never has been about empty values. People have problems with empty values in [ when they forget to quote their variables, but that's not a problem with [ then.

$ a= b='-o x'
[ $a = $b ]

with the default value of $IFS becomes:

[ = -o x ]

Which is a test of whether = or x is a non-empty string, but no amount of prefixing will help³ as [ x$a = x$b ] will still be: [ x = x-o x ] which would cause an error, and it could get a lot worse including DoS and arbitrary command injection with other values like in bash:

bash-4.4$ a= b='x -o -v a[`uname>&2`]'
bash-4.4$ [ x$a = x$b ]
Linux

The correct solution is to always quote:

[ "$a" = "$b" ]   # OK in POSIX compliant [ / shells
[ "x$a" = "x$b" ] # OK in all Bourne-like shells

Note that expr has similar (and even worse) problems.

expr also has a = operator, though it's for testing whether the two operands are equal integers when they look like decimal integer numbers, or sort the same when not.

In many implementations, expr + = +, or expr '(' = ')' or expr index = index don't do equality comparison. expr "x$a" = "x$b" would work around it for string comparison, but prefixing with an x could affect the sorting (in locales that have collating elements starting with x for instance) and obviously can't be used for number comparison expr "0$a" = "0$b" doesn't work for comparing negative integers. expr " $a" = " $b" works for integer comparison in some implementations, but not others (for a=01 b=1, some would return true, some false).

¹ ksh93 is an exception. In ksh93, [ can be seen as a reserved word in that [ -t ] is actually different from var=-t; [ "$var" ], or from ""[ -t ] or cmd='['; "$cmd" -t ]. That's to preserve backward compatibility and still be POSIX compliant in cases where it matters. The -t is only taken as an operator here if it's literal, and ksh93 detects that you're calling the [ command.

² ksh added a [[...]] conditional expression operator with its own syntax parsing rules (and some problems of its own) to address that (also found in some other shells, with some differences).

³ except in zsh where split+glob is not invoked upon parameter expansion, but empty removal still is, or in other shells when disabling split+glob globally with set -o noglob; IFS=

People often ascribe the prefix to problems with empty strings, but that is not the reason for it. The problem is a very simple one: the expansion of the variable could be one of test's operators, suddenly turning a binary equality test into a different expression.

Recent implementations of the command on most platforms avoid the pitfall with look-ahead in the expression parser, preventing the parser from recognizing the first operand to a binary operator as anything other than an operand, as long as there are enough tokens to be a binary operator of course:

% a=-n
% /bin/test "$a" = -n ; echo $?
0
% /bin/test "$a" = ; echo $?
0
% /bin/test x"$a" = ; echo $?
test: =: argument expected
2
% a='('
% /bin/test "$a" = "(" ; echo $?
0
% /bin/test "$a" = ; echo $?
test: closing paren expected
2
%

What's the purpose of adding a prefix on both sides of a shell variable comparison to a string literal?

Tags:

String

Shell

Variable

Test

Related

Recent Posts