What's a safe and portable way to split a string in shell programming?

The obvious solution would be to use the shell word splitting, but beware of a few gotchas:

IFS=:
set -o noglob
for dir in $PATH''; do
    dir=${dir:-.}
    [ -x "${dir%/}/$1" ] && printf "%s\n" "$dir"
done

You need set -o noglob because when a variable is left unquoted, both word splitting and filename generation (globbing) are performed on it and here you only want word splitting (for instance, in the unlikely event that $PATH contains /usr/local/*bin*, you want it do look in the /usr/local/*bin* folder, not in /usr/local/bin and /usr/local/sbin..., and if PATH contains /*/*/*/../../../*/*/*/*/../../../*/*/*/*, you don't want it to bring your machine down)

An empty $PATH component means the current directory (.), not /. $dir/$1 wouldn't be correct in that case. The work around is either to write $dir${dir:+/}$1 or to change $dir to . in that case (which gives a more useful output when displayed with printf '%s\n' "$dir".

//foo is not necessarily the same as /foo, so if / is in $PATH, you don't want $dir/$1, which would be //$1. Hence the ${dir%/} to strip a trailing slash.

Then, there are a few other problems:

For $PATH, ":" is a field separator while for $IFS, it is a field terminator (yes, I know, S is for Separator, blame ksh and POSIX for standardizing the ksh behaviour).

So if $PATH is /usr/bin:/bin: (which is bad practice but still commonly found), that means "/usr/bin", "/bin" and "" (that is, the current directory), while the shell word splitting (all POSIX shells except zsh) will split that into /usr/bin and /bin only.

If $PATH is set but empty, that means: "look in the current directory only". While shells (including those that treat $IFS as a separator) will expand it to an empty list.

Appending the '' to $PATH above works around both issues.

Last but not least. If $PATH is unset, then that has a special meaning which is: look in the system default search list, which unfortunately means something different depending on who (what command) you ask.

$ env -u PATH bash -c 'type usbipd'
usbipd is /usr/local/sbin/usbipd
$ env -u PATH ksh -c 'type usbipd'
ksh: whence: usbipd: not found

And basically, in your script, you'd have to guess what that default search path is in the context that matters to you.

Note that POSIX leaves the behaviour unspecified when $PATH is unset or empty, so won't help you there. That also means that what I said above may not apply to some past, current or future POSIX/Unix systems.

In short, parsing $PATH to try and find out where a command would be run from is a tricky business.

There is a standard command for that, which is command:

ls_path=$(command -v ls)

But what one may ask is: why do you want to know?

Now onto restoring IFS to its default value:

oldIFS=$IFS
IFS=:
...
IFS=$oldIFS

will work in practice in most cases but is not guaranteed to work by POSIX.

The reason is that if $IFS was previously unset which means default splitting behaviour (that is in POSIX shells, split on space, tab or newline), after those commands, it will end up set but empty (which means no splitting).

Another potential problem is if you generalise that approach and use it in a lot of different functions, then if in the ... part above, you're calling a function that does the same thing (makes a copy of $IFS in $oldIFS), then you're going to loose the original $oldIFS and restore the wrong $IFS.

Instead you could use subshells when possible:

(
  IFS=:
  ...
)
# only the subshell's IFS was affected, the parent still has its own IFS

My approach is to set $IFS (and turn set -o noglob on or off) every time I need word splitting (which is rare) and not bother restoring the previous value. Of course, that doesn't work if your script calls someone else's code that doesn't follow that practice and assumes a default word splitting behaviour.


Just set IFS according to you needs and let the shell perform word splitting:

IFS=':'
for dir in $PATH; do
    [ -x "$dir"/"$1" ] && echo $dir
done

This works in bash, dash and ksh, but tested only with the latest versions.