How can I safely ensure a variable contains only a valid filename?

This answer assumes that $1 is allowed to include subdirectories. If you are interested in the simpler case where $1 should be a simple directory name, then see one of the other answers.


Wildcards are not expanded when in double-quotes. Since $1 is in double-quotes, wildcards are not a problem.

Both ../ and symlinks can obscure the real location of a file. Shown below are tests to determine if the file is really, not just seemingly, under the path we want.

Newer systems: using realpath

As for finding out if the file is really if the file is really under /home/charlesingalls/ or not, you can use realpath:

realpath --relative-base=/home/charlesingalls/ "/home/charlesingalls/$1"  | grep -q '^/' && exit 1

The above runs exit 1 if the file specified by $1 is anywhere other than under the directory /home/charlesingalls/. realpath canonicalizes the whole path, eliminating both symlinks and ../.

realpath is part of GNU coreutils and should be available on any Linux system.

realpath requires GNU coreutils 8.15 (Jan 2012) or better.

Examples

To demonstrate how realpath follows ../ to determine the real location of a file (for examples, the -q option to grep is omitted so that the actual output of grep is visible):

$ touch /tmp/test
$ realpath --relative-base=$HOME "$HOME/../../tmp/test" | grep '^/' && echo FAIL
/tmp/test
FAIL

To demonstrate how it follows symlinks:

$ ln -s /tmp/test ~/test
$ realpath --relative-base=$HOME "$HOME/test" | grep '^/' && echo FAIL
/tmp/test
FAIL

Older systems: using readlink -e

readlink is also capable of cononicalizing a path, following both symlinks and ../:

readlink -e "$HOME/test" | grep -q "^$HOME" || exit 1

Using the same example files:

$ readlink -e "$HOME/../../tmp/test" | grep "$HOME" || echo FAIL
FAIL
$ readlink -e "$HOME/test" | grep "^$HOME" || echo FAIL
FAIL

In addition to being available on older GNU systems, versions of readlink are available on BSD.


If you only want to delete a file in /home/charlesingalls (and not a file in a subdirectory) then it's easy: just check that the argument doesn't contain a /.

case "$1" in
  */*) echo 1>&2 "Refusing to remove a file in another directory"; exit 2;;
  *) rm -f /home/charlesingalls/"$1";;
esac

This runs rm even if the argument is . or .. or empty, but in that case rm will harmlessly fail to delete a directory.

Wildcards are not relevant here since no wildcard expansion is performed.

This is safe even in the presence of symbolic links: if the file is a symbolic link, the symlink (which is in /home/charlesingalls) gets removed, and the target of that link is not affected.

Note that this assumes that /home/charlesingalls cannot be moved or changed. That should be ok if the directory is hard-coded in the script, but if it's determined from variables then the determination might no longer be valid by the time the rm command runs.

Based on the additional information that the argument is a virtual host name, you should do whitelisting rather than blacklisting: check that the name is a sensible virtual host name, rather than just banning slashes. I check that the name starts with a lowercase letter or digit and that it does not contain characters other than lowercase letters, digits, dots and dashes.

LC_CTYPE=C LC_COLLATE=C
case "$1" in
  *[!-.0-9a-z]*|[!0-9a-z]*) echo >&2 "Invalid host name"; exit 2;;
  *) rm -f /home/charlesingalls/"$1";;
esac

If you want to forbid paths completely, the simplest way is to test if the variable contains a slash (/). In bash:

if [[ "$1" = */* ]] ; then...

This will block all paths, though, including foo/bar. You could test for .. instead, but that would leave the possibility of symlinks pointing to directories outside the target path.

If you only want to allow deleting a single file, I don't think you should be using rm -r.


Also, depending on what you are doing, you could use the system's file permissions to only allow deleting files the user could delete themselves. Something like this:

su charlesingalls -c "rm /home/charlesingalls/'$1'"

Though as @Gilles commented, this has a quoting issue: it will fail if $1 contains a single quote, so the variable should be tested for that first (with e.g. if [[ "$1" = *\'* ]] ; then fail... or rather by whitelisting a sensible set of characters), or the file name passed through an environment variable with e.g.

file="$1" su charlesingalls -c 'rm "/home/charlesingalls/$file"'