Why does bind mounting a file after unlink fail with ENOENT?

The mount(2) system call will completely resolve its paths through mounts and symlinks, but unlike open(2), will not accept a path to a deleted file, ie a path which resolves to a unlinked directory entry.

(similar to the <filename> (deleted) paths of /proc/PID/fd/FD, procfs will display unlinked dentries as <filename>//deleted in /proc/PID/mountinfo)

# unshare -m
# echo foo > foo; touch bar baz quux
# mount -B foo bar
# mount -B bar baz
# grep foo /proc/self/mountinfo
56 38 8:7 /tmp/foo /tmp/bar ...
57 38 8:7 /tmp/foo /tmp/baz ...

# rm foo
# grep foo /proc/self/mountinfo
56 38 8:7 /tmp/foo//deleted /tmp/bar ...
57 38 8:7 /tmp/foo//deleted /tmp/baz ...
# mount -B baz quux
mount: mount(2) failed: /tmp/quux: No such file or directory

All this used to work in older kernels, but does not since v4.19, first introduced by this change:

commit 1064f874abc0d05eeed8993815f584d847b72486
Author: Eric W. Biederman <[email protected]>
Date:   Fri Jan 20 18:28:35 2017 +1300

    mnt: Tuck mounts under others instead of creating shadow/side mounts.
...
+       /* Preallocate a mountpoint in case the new mounts need
+        * to be tucked under other mounts.
+        */
+       smp = get_mountpoint(source_mnt->mnt.mnt_root);
+       if (IS_ERR(smp))
+               return PTR_ERR(smp);
+

It looks that this effect was unintended by the change. Since then other unrelated changes have piled on, confusing it even more.

A consequence of it is that it also prevents pinning a deleted file somewhere else in the namespace via an open fd to it:

# exec 7>foo; touch bar
# rm foo
# mount -B /proc/self/fd/7 bar
mount: mount(2) failed: /tmp/bar: No such file or directory

The last command fails because of the same condition as the OP's.

You can even re-create a, pointing to the same exact inode, but you get the same thing

It's the same thing as with /proc/PID/fd/FD "symlinks". The kernel is smart enough to follow a file through straight renames, but not through ln + rm (link(2) + unlink(2)):

# unshare -m
# echo foo > foo; touch bar baz
# mount -B foo bar
# mount -B bar baz
# grep foo /proc/self/mountinfo
56 38 8:7 /tmp/foo /tmp/bar ...
57 38 8:7 /tmp/foo /tmp/baz ...

# mv foo quux
# grep bar /proc/self/mountinfo
56 38 8:7 /tmp/quux /tmp/bar ...

# ln quux foo; rm quux
# grep bar /proc/self/mountinfo
56 38 8:7 /tmp/quux//deleted /tmp/bar ...

Walking through the source code, I found exactly one ENOENT that was relevant, i.e. for an unlinked directory entry:

static int attach_recursive_mnt(struct mount *source_mnt,
            struct mount *dest_mnt,
            struct mountpoint *dest_mp,
            struct path *parent_path)
{
    [...]

    /* Preallocate a mountpoint in case the new mounts need
     * to be tucked under other mounts.
     */
    smp = get_mountpoint(source_mnt->mnt.mnt_root);

static struct mountpoint *get_mountpoint(struct dentry *dentry)
{
    struct mountpoint *mp, *new = NULL;
    int ret;

    if (d_mountpoint(dentry)) {
        /* might be worth a WARN_ON() */
        if (d_unlinked(dentry))
            return ERR_PTR(-ENOENT);

https://elixir.bootlin.com/linux/v5.2/source/fs/namespace.c#L3100

get_mountpoint() is generally applied to the target, not the source. In this function, it is called because of mount propagation. It is necessary to enforce the rule that you cannot add mounts on top of a deleted file, during mount propagation. But the enforcement is happening eagerly, even if no mount propagation happens that would require this. I think it is good that the checking is consistent like this, it is just coded a bit more obscurely than I would ideally prefer.

Either way I look at it, I think it is reasonable to enforce this. So long as it helps reduce the number of weird cases to analyze, and no-one has an especially compelling counter-argument.

Why does bind mounting a file after unlink fail with ENOENT?

Tags:

Linux

Bind Mount

Related

Recent Posts