Is it feasible to have home folder hosted with NFS?

Solution 1:

I use NFS for my home directories in our production environment. There are a couple of tricks.

  1. Don't NFS mount to /home - that way you can have a local user that allows you in in the event that the NFS server goes down. We mount to /mnt/nfs/home

  2. Use soft mounts and a very short timeout - this will prevent processes from blocking forever.

  3. Use the automounter. This will keep resource usage down and also means that you don't need to worry about restarting services when the NFS server comes up if it goes down for some reason.

    auto.master:
      +auto.master
      /mnt/nfs /etc/auto.home --timeout=300
    
    auto.home
       home -rw,soft,timeo=5,intr      home.bzzprod.lan:/home
    
  4. Use a single sign-on system so you don't run into permission related issues. I have an OpenLDAP server.

Solution 2:

HowtoForge posted an article titled Creating An NFS-Like Standalone Storage Server With GlusterFS On Debian Lenny, you may want to check it out.

Here is a short description of why it's a good "feasible" alternative to NFS, from the GlusterFS project page:

GlusterFS self-heals itself on the fly. There is no fsck. Storage backend is accessible directly as regular files and folders (NFS style). With replication enabled, GlusterFS can with-stand hardware failures.

More information can be found in the project documentation.

Also, another nice thing about using GlusterFS is if you need more space on your SAN you just add another storage brick (server node) and you are able to scale/grow your storage in parallel when there is need.


Solution 3:

Be careful with the soft mounts! Soft mounting an NFS filesystem means IO will fail after a timeout occurs. Be very sure that is what you want on users' home directories! My guess is you don't. Using a hard mount on home directories in combination with the intr option feels a lot safer here.

Hard will not timeout: IO operations will be retried indefinitely. The intr option makes it possible to interrupt the mounting process. So if you mount the export and experience a failure, the hard-mount will lock your session. The intr option will make it possible to interrupt the mount, so the combination is pretty safe and ensures you will not easily lose a user's data.

Anyway, autofs makes this all even easier.


Solution 4:

The one thing to note is that when the NFS server is out - your mounts will freeze - doing a soft mount will not block so the "freeze" itself can be avoided, however that will not fix the problem of home directories as without a home directory, the user is screwed anyway.

Even when the NFS server recovers, unless you do something about it, the freeze problem will remain - you'll have to kill the process on the mounting machine, and remount. The reason for this is that when the NFS server comes back up, it assigned a different fsid - so you can at least fix this problem by hard-coding the fsids on the NFS server, for example...

#. Home Directories
/usr/users \
  192.168.16.0/22(rw,sync,no_root_squash,fsid=1) \
  192.168.80.0/22(rw,sync,no_root_squash,fsid=1)

#. Scratch Space
/var/ftp/scratch \
  192.168.16.0/22(rw,async,no_root_squash,fsid=3) \
  192.168.80.0/22(rw,async,no_root_squash,fsid=3) \
  172.28.24.151(rw,async,root_squash,fsid=3)

The exports(5) man page states...

fsid=num
          This option forces the filesystem identification portion of the file handle
          and  file attributes used on the wire to be num instead of a number derived
          from the major and minor number of the block device on which the filesystem
          is  mounted.   Any 32 bit number can be used, but it must be unique amongst
          all the exported filesystems.

          This can be useful for NFS failover, to ensure that  both  servers  of  the
          failover  pair use the same NFS file handles for the shared filesystem thus
          avoiding stale file handles after failover.

...While that indicates that as long as the major/minor numbers do not change (which they usually don't, except for when you're exporting SAN/multipath volumes, where the may change), I've found that we've completely removed the problem - i.e., if the NFS server comes back - the connection has been restored quickly - I still really don't know why this has made a difference for devices such as /dev/sdaX for example.

I should now point out that my argument is largely anecdotal - it doesn't actually make sense why it has fixed the problem, but it "seems" to have fixed it - somehow - there are probably other variables at play here that I've not yet discovered. =)


Solution 5:

Some general advice that will apply no matter which network filesystem you adopt: many programs cache data in the user's home directory, which usually does more harm than good when the home directory is accessed over a network.

These days, you can tell many programs to store their caches elsewhere (e.g., on a local disk) by setting the XDG_CACHE_HOME environment variable in a login script. Lots of programs (e.g., Firefox) still require manual configuration, however, so you will probably have to do some extra work to identify and configure them in a uniform manner for all your users.