rsync over NFS - inconsistent speed

Unfortunately just about the worst thing you can do is to use rsync across NFS. (Or to any remote filesystem that's mounted into the local system.) This switches off almost all of the efficiency enhancements for which rsync is known.

For this much data one of the fastest ways to transfer it between systems may be to dump it across an unencrypted connection without any consideration for what was already on the target system.

Once you have at least a partial copy the best option is to use rsync between the two hosts. This allows rsync to run one process on each host to consider and compare differences. (The rsync will completely skip files that have the same size and modification date. For other files the client and server components will perform a rolling checksum to determine which block(s) need still to be transferred.)

  1. Fast dump. This example uses no authentication or encryption at all. It does apply compression, though, which you can remove by omitting both -z flags:

    Run this on the destination machine to start a listening server:

    cd /path/to/destination && nc -l 50505 | pax -zrv -pe
    

    Run this on the source machine to start the sending client:

    cd /path/to/source && pax -wz . | nc destination_server 50505
    

    Some versions of nc -l may require the port to be specified with a flag, i.e. nc -l -p 50505. The OpenBSD version on Debian (nc.openbsd, linked via /etc/alternatives to /bin/nc) does not.

  2. Slower transfer. This example uses rsync over ssh, which provides authentication and encryption. Don't miss off the trailing slash (/) on the source path. Omit the -z flag if you don't want compression:

    rsync -avzP /path/to/source/ destination_server:/path/to/destination
    

You may need to set up SSH certificates to allow login to destination_server as root. Add the -H flag if you need to handle hard links.


It is far better to use rsync directly between two hosts if possible. Remember, rsync is built to optimise network IO at the cost of increased disk IO; when using rsync on an NFS filesystem, disk IO translates to network IO, so that is a very suboptimal solution. Also if rsync thinks that both source and destination is local, it will switch off the optimizations and transfer complete files every time, instead of using the differential algorithm that only sends the differences.

Say you have a 5GB file that only differs in 1% of the data between source and destination.

  • When transferring between hosts, rsync will checksum the source and destination files, and only transfer the difference; on the destination the file is recreated using the old file and the new data from the source, and then the old file is replaced.
  • When transferring locally, it makes no sense to checksum each file, meaning you'd have to read 2 x 5GB and write 1 x 5GB for the example file. By switching to whole file mode, rsync only needs to read 1 x 5GB and write 1 x 5GB. On local disks this makes complete sense, when one is NFS the network bandwidth shoots through the roof.

If you can use rsync directly to the host serving the NFS filesystem, then do that, you will see a big improvement in the performance.

Tags:

Rsync

Nfs