Symbolic link recursion - what makes it "reset"?

Patrice identified the source of the problem in his answer, but if you want to know how to get from there to why you get that, here's the long story.

The current working directory of a process is nothing you'd think too complicated. It is an attribute of the process which is a handle to a file of type directory where relative paths (in system calls made by the process) start from. When resolving a relative path, the kernel doesn't need to know the (a) full path to that current directory, it just reads the directory entries in that directory file to find the first component of the relative path (and .. is like any other file in that regard) and continues from there.

Now, as a user, you sometimes like to know where that directory lies in the directory tree. With most Unices, the directory tree is a tree, with no loop. That is, there's only one path from the root of the tree (/) to any given file. That path is generally called the canonical path.

To get the path of the current working directory, what a process has to do is just walk up (well down if you like to see a tree with its root at the bottom) the tree back to the root, finding the names of the nodes on the way.

For instance, a process trying to find out that its current directory is /a/b/c, would open the .. directory (relative path, so .. is the entry in the current directory) and look for a file of type directory with the same inode number as ., find out that c matches, then opens ../.. and so on until it finds /. There's no ambiguity there.

That's what the getwd() or getcwd() C functions do or at least used to do.

On some systems like modern Linux, there's a system call to return the canonical path to the current directory which does that lookup in kernel space (and allows you to find your current directory even if you don't have read access to all its components), and that's what getcwd() calls there. On modern Linux, you can also find the path to the current directory via a readlink() on /proc/self/cwd.

That's what most languages and early shells do when returning the path to the current directory.

In your case, you can call cd a as may times as you want, because it's a symlink to ., the current directory doesn't change so all of getcwd(), pwd -P, python -c 'import os; print os.getcwd()', perl -MPOSIX -le 'print getcwd' would return your ${HOME}.

Now, symlinks went complicating all that.

symlinks allow jumps in the directory tree. In /a/b/c, if /a or /a/b or /a/b/c is a symlink, then the canonical path of /a/b/c would be something completely different. In particular, the .. entry in /a/b/c is not necessarily /a/b.

In the Bourne shell, if you do:

cd /a/b/c
cd ..

Or even:

cd /a/b/c/..

There's no guarantee you'll end up in /a/b.

Just like:

vi /a/b/c/../d

is not necessarily the same as:

vi /a/b/d

ksh introduced a concept of a logical current working directory to somehow work around that. People got used to it and POSIX ended up specifying that behaviour which means most shells nowadays do it as well:

For the cd and pwd builtin commands (and only for them (though also for popd/pushd on shells that have them)), the shell maintains its own idea of the current working directory. It's stored in the $PWD special variable.

When you do:

cd c/d

even if c or c/d are symlinks, while $PWD containes /a/b, it appends c/d to the end so $PWD becomes /a/b/c/d. And when you do:

cd ../e

Instead of doing chdir("../e"), it does chdir("/a/b/c/e").

And the pwd command only returns the content of the $PWD variable.

That's useful in interactive shells because pwd outputs a path to the current directory that gives information on how you got there and as long as you only use .. in arguments to cd and not other commands, it's less likely to surprise you, because cd a; cd .. or cd a/.. would generally get you back to where you were.

Now, $PWD is not modified unless you do a cd. Until the next time you call cd or pwd, a lot of things could happen, any of the components of $PWD could be renamed. The current directory never changes (it's always the same inode, though it could be deleted), but its path in the directory tree could change completely. getcwd() computes the current directory each time it's called by walking down the directory tree so its information is always accurate, but for the logical directory implemented by POSIX shells, the information in $PWD might become stale. So upon running cd or pwd, some shells may want to guard against that.

In that particular instance, you see different behaviours with different shells.

Some like ksh93 ignore the problem completely, so will return incorrect information even after you call cd (and you wouldn't see the behaviour that you're seeing with bash there).

Some like bash or zsh do check that $PWD is still a path to the current directory upon cd, but not upon pwd.

pdksh does check upon both pwd and cd (but upon pwd, does not update $PWD)

ash (at least the one found on Debian) does not check, and when you do cd a, it actually does cd "$PWD/a", so if the current directory has changed and $PWD no longer points to the current directory, it will actually not change to the a directory in the current directory, but the one in $PWD (and return an error if it doesn't exist).

If you want to play with it, you can do:

cd
mkdir -p a/b
cd a
pwd
mv ~/a ~/b 
pwd
echo "$PWD"
cd b
pwd; echo "$PWD"; pwd -P # (and notice the bug in ksh93)

in various shells.

In your case, since you're using bash, after a cd a, bash checks that $PWD still points to the current directory. To do that, it calls stat() on the value of $PWD to check its inode number and compare it with that of ..

But when the looking up of the $PWD path involves resolving too many symlinks, that stat() returns with an error, so the shell cannot check whether $PWD still corresponds to the current directory, so it computes it a again with getcwd() and updates $PWD accordingly.

Now, to clarify Patrice's answer, that check of number of symlinks encountered while looking up a path is to guard against symlink loops. The simplest loop can be made with

rm -f a b
ln -s a b
ln -s b a

Without that safe guard, upon a cd a/x, the system would have to find where a links to, finds it's b and is a symlink which links to a, and that would go on indefinitely. The simplest way to guard against that is to give up after resolving more than an arbitrary number of symlinks.

Now back to the logical current working directory and why it's not so good a feature. It's important to realise that it's only for cd in the shell and not other commands.

For instance:

cd -- "$dir" &&  vi -- "$file"

is not always the same as:

vi -- "$dir/$file"

That's why you'll sometimes find that people recommend to always use cd -P in scripts to avoid confusion (you don't want your software to handle an argument of ../x differently from other commands just because it's written in shell instead of another language).

The -P option is to disable the logical directory handling so cd -P -- "$var" actually does call chdir() on the content of $var (at least as long as $CDPATH it not set, and except when $var is - (or possibly -2, +3... in some shells) but that's another story). And after a cd -P, $PWD will contain a canonical path.


This is the result of a hard-coded limit in the Linux kernel source; to prevent denial-of-service, the limit on the number of nested symlinks is 40 (found in the follow_link() function inside fs/namei.c, called by nested_symlink() in the kernel source).

You would probably get a similar behaviour (and possibly another limit than 40) with other kernels supporting symlinks.

Tags:

Bash

Symlink