Why am I missing /var/run/sshd after every boot?

Solution 1:

One mistake you did was trying to start sshd by hand.

If you instead start sshd through official means it should just work. The service command knows what the correct way to start a service on your distribution is, and this should work:

service ssh start

In case of sysv init scripts, that's everything you need to do. The reason the directory is missing is that /var/run is a symlink to /run and /run is a tmpfs mount point. That means on each boot /var/run will start out empty. When you use the service command the /etc/init.d/ssh script will be used to start sshd but before doing that the script will create /var/run/sshd if it doesn't exist.

With systemd things work a bit differently. There will be a file called /usr/lib/tmpfiles.d/sshd.conf with this content:

d /var/run/sshd 0755 root root

During boot this should cause the /var/run/sshd directory to be created. What you need to verify that the file exists and has the correct contents. If the /var/run/sshd directory is still missing you can verify if it gets created when you run systemd-tmpfiles --create manually.

Solution 2:

So /run (and /var/run symlinked to it) gets recreated every reboot. Except that systemd-tmpfiles isn't doing that for some files including (/var)/run/sshd.

Apparently, this is fixed by a OpenVZ kernel upgrade. But to actually fix it now you edit /usr/lib/tmpfiles.d/sshd.conf and remove /var from the line d /var/run/sshd 0755 root root to read instead: d /run/sshd 0755 root root

And that's it..!

And when openssh-server gets upgraded, we hope that they will have fixed this bug (or is it really a bug in systemd? or openvz??) -- otherwise you could run into the same problem.


Solution 3:

Apparently this gets resolved when running an OpenVZ kernel 2.6.32-042stab134.7 or newer. I find it strange that there is no fix possible in the systemd start scripts somehow. Probably an ugly hack like automatically creating /run/sshd/ after starting up and then starting sshd would work.

The output of my systemd-tmpfiles --create:

[/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring.
fchownat() of /run/named failed: Invalid argument
Failed to openat(/dev/simfs): Operation not permitted
Failed to validate path /var/run/screen: Too many levels of symbolic links
Failed to validate path /var/run/sshd: Too many levels of symbolic links
Failed to validate path /var/run/sudo: Too many levels of symbolic links
Failed to validate path /var/run/sudo/ts: Too many levels of symbolic links
fchownat() of /run/systemd/netif failed: Invalid argument
fchownat() of /run/systemd/netif/links failed: Invalid argument
fchownat() of /run/systemd/netif/leases failed: Invalid argument
fchownat() of /run/log/journal failed: Invalid argument
fchownat() of /run/log/journal/e9e1d08bc42c48999865b96c250f40cc failed: Invalid argument
fchownat() of /run/log/journal/e9e1d08bc42c48999865b96c250f40cc/system.journal failed: Invalid argument

The changelog of OpenVZ 2.6.32-042stab134.7 says this:

Running Ubuntu containers with systemd 229-4ubuntu21.9 could result in services failing to start because systemd-tmpfiles was unable to validate path due to symlinking issues. (PSBM-90038)


Solution 4:

For as much trouble as I've had with systemd over the years, I must admit this issue stems instead from the Ansible synchronize directive.

For some reason, after provisioning this host with our ansbile scripts, it left the / directory (as well as /etc, /opt and others) owned by an admin user, and not root. After running chown to correct things, /var/run/sshd is now created on boot again.

I really appreciate all the input but there is no bug here, at least in the sense that applying inappropriate ownership to root directories caused undefined system behavior.