Is there a faster alternative to cp for copying large files (~20 GB)?

%CPU should be low during a copy. The CPU tells the disk controller "grab data from sectors X–Y into memory buffer at Z". Then it goes and does something else (or sleep, if there is nothing else). The hardware triggers an interrupt when the data is in memory. Then the CPU has to copy it a few times, and tells the network card "transmit packets at memory locations A, B, and C". Then it goes back to doing something else.

You're pushing ~240mbps. On a gigabit LAN, you ought to be able to do at least 800mbps, but:

  1. That's shared among everyone using the file server (and possibly a connection between switches, etc.)
  2. That's limited by the speed the file server can handle the write, keeping in mind its disk I/O bandwidth is shared by everyone using it.
  3. You didn't specify how you're accessing the file server (NFS, CIFS (Samba), AFS, etc.). You may need to tune your network mount, but on anything half-recent the defaults are usually pretty sane.

For tracking down the bottleneck, iostat -kx 10 is going to be a useful command. It'll show you the utilization on your local hard disks. If you can run that on the file server, it'll tell you how busy the file server is.

The general solution is going to be to speed up that bottleneck, which of course you don't have the budget for. But, there are a couple of special cases where you can find a faster approach:

  • If the files are compressible, and you have a fast CPU, doing a minimal compress on-the-fly might be quicker. Something like lzop or maybe gzip --fastest.
  • If you are only changing a few bits here and there, and then sending the file back, only sending deltas will be much faster. Unfortunately, rsync won't really help here, as it will need to read the file on both sides to find the delta. Instead, you need something that keeps track of the delta as you change the file... Most approaches here are app-specific. But its possible that you could rig something up with, e.g., device-mapper (see the brand new dm-era target) or btrfs.
  • If you're copying the same data to multiple machines, you can use something like udpcast to send it to all the machines at once.

And, since you note you're not the sysadmin, I'm guessing that means you have a sysadmin. Or at least someone responsible for the file server & network. You should probably ask him/her/them, they should be much more familiar with the specifics of your setup. Your sysadmin(s) should at least be able to tell you what transfer rate you can reasonably expect.


This could, possibly, be a faster alternative, and you won't clog the network for two days: Take one or two large USB (USB 3 if you have it) or FireWire disks, connect it to the server and copy the files to the disk. Carry the disk to your local machine. Copy the files to the machine.


If you have direct SSH (or SFTP) access (ask your sysadmin), you can use scp with compression (-C):

scp -C you@server:/path/to/yourfile .

Of course, that's only useful if the file is compressible, and this will use more CPU time, since it will be using encryption (because it's over SSH), and compressing.

Tags:

Cp

File Copy