Fault-tolerant NFS?

Solution 1:

You could buy a system that can tollerate a CPU failure, or you could implement more than one server. You can create an NFS failover cluster fairly easily on Linux (I'm sure Sun et all have a mechanism for this too).

A fairly well supported/common way to do it is with heartbeat, (first link I found on Google, search NFS and heartbeat) to manage the cluster and then share the storage between the servers. The important thing to do with NFS to ensure a transparent failover is to also share the NFS state information which is usually in /var/lib/nfs. You can do that by putting it on the shared storage.

edit: Also setting the fsid option to the same value on the NFS export on each server will prevent you from getting stale file handles when the cluster fails over.

Solution 2:

nfs 4.1 supports pNFS, which is clustered. http://www.pnfs.com/