Understanding shell builtin commands

The commands that are built into the shell are often built in because of the performance increase that this gives. Calling the external printf, for example, is slower than using the built in printf.

Since some utilities do not need to be built in, unless they are special, like cd, they are also provided as external utilities. This is so that scripts won't break if they are interpreted by a shell that does not provide a built in equivalent.

Some shell's built-ins also provide extensions to the external equivalent command. Bash's printf, for example is able to do

$ printf -v message 'Hello %s' "world"
$ echo "$message"
Hello world

(print to a variable) which the external /usr/bin/printf simply wouldn't be able to do since it doesn't have access to the shell variables in the current shell session (and can't change them).

Built in utilities also does not have the restriction that their expanded command line has to be shorter than a certain length. Doing

printf '%s\n' *

is therefore safe if printf is a shell built-in command. The restriction on the length of the command line comes from the execve() C library function used to execute an external command. If the command line and the current environment is larger than ARG_MAX bytes (see getconf ARG_MAX in the shell), the call to execve() will fail. If the utility is built into the shell, execve() does not have to be called.

Built in utilities take precedence over utilities found in $PATH. To disable a built-in command in bash, use e.g.

enable -n printf

There's a short list of utilities that need to be built into a shell (taken from the POSIX standard's list of special built-ins)

break
colon (:)
continue
dot (.)
eval
exec
exit
export
readonly
return
set
shift
times
trap
unset

These need to be built in since they directly manipulate the environment and program flow of the current shell session. An external utility would not be able to do that.

Interestingly, cd is not part of this list, but POSIX says the following about that:

Since cd affects the current shell execution environment, it is always provided as a shell regular built-in. If it is called in a subshell or separate utility execution environment, such as one of the following:

(cd /tmp)
nohup cd
find . -exec cd {} \;

it does not affect the working directory of the caller's environment.

I'm therefore assuming that the "special" built-ins can't have external counterparts, while cd in theory could have (but it wouldn't do very much).


You are (very understandably) confused by the fact that some builtins exist both as builtins and as external commands. So, while you're right that, for example, there is a /bin/[ command, that doesn't mean that its "actual location" is in /bin.

Any easy way to test this is to run type with the -a switch which will show all available instances of a command. On my Arch system, that shows:

$ type -a [
[ is a shell builtin
[ is /sbin/[
[ is /usr/sbin/[
[ is /usr/bin/[

Note that /sbin, /usr/sbin and /bin are all symlinks pointing to /usr/bin, so there is only one external [:

$ readlink -f /usr/sbin /sbin /bin/
/usr/bin
/usr/bin
/usr/bin

As you can see, [ is both a builtin and an external command, and the same is true of various other shell builtins. However, that does not change the fact that they are also shell builtins, compiled into the shell itself.