How to copy a large number of files quickly between two servers

Solution 1:

I would recommend tar. When the file trees are already similar, rsync performs very well. However, since rsync will do multiple analysis passes on each file, and then copy the changes, it is much slower than tar for the initial copy. This command will likely do what you want. It will copy the files between the machines, as well as preserve both permissions and user/group ownerships.

tar -c /path/to/dir | ssh remote_server 'tar -xvf - -C /absolute/path/to/remotedir'

As per Mackintosh's comment below this is the command you would use for rsync

rsync -avW -e ssh /path/to/dir/ remote_server:/path/to/remotedir

Solution 2:

External hard drive and same-day courier delivery.


Solution 3:

I'd use rsync.

If you've got them exported via HTTP with directory listings available, you could use wget and the --mirror argument, too.

You're already seeing that HTTP is faster than SCP because SCP is encrypting everything (and thus bottlenecking on the CPU). HTTP and rsync are going to move faster because they're not encrypting.

Here's some docs on setting up rsync on Ubuntu: https://help.ubuntu.com/community/rsync

Those docs talk about tunneling rsync over SSH, but if you're just moving data around on a private LAN you don't need SSH. (I'm assuming you are on a private LAN. If you're getting 9-10MB/sec over the Internet then I want to know what kind of connections you have!)

Here are some other very basic docs that will allow you to setup a relative insecure rsync server (w/ no dependence on SSH): http://transamrit.net/docs/rsync/


Solution 4:

Without much discussion, use netcat, network swissarmy knife. No protocol overhead, you're directly copying to the network socket. Example

srv1$ tar cfv - *mp3 | nc -w1 remote.server.net 4321

srv2$ nc -l -p 4321 |tar xfv -

Solution 5:

With lots of files if you do go with rsync, I would try to get version 3 or above on both ends. The reason being that a lesser version will enumerate every file before it starts the transfer. The new feature is called incremental-recursion.

A new incremental-recursion algorithm is now used when rsync is talking to another 3.x version. This starts the transfer going more quickly (before all the files have been found), and requires much less memory. See the --recursive option in the manpage for some restrictions.