Quickest way to transfer 55GB of images to new server

Solution 1:

Instead of using tar to write to your local disk, you can write directly to the remote server over the network using ssh.

server1$ tar -zc ./path | ssh server2 "cat > ~/file.tar.gz"

Any string that follows your "ssh" command will be run on the remote server instead of the interactive logon. You can pipe input/output to and from those remote commands through SSH as if they were local. Putting the command in quotes avoids any confusion, especially when using redirection.

Or, you can extract the tar file on the other server directly:

server1$ tar -zc ./path | ssh server2 "tar -zx -C /destination"

Note the seldom-used -C option. It means "change to this directory first before doing anything."

Or, perhaps you want to "pull" from the destination server:

server2$ tar -zx -C /destination < <(ssh server1 "tar -zc -C /srcdir ./path")

Note that the <(cmd) construct is new to bash and doesn't work on older systems. It runs a program and sends the output to a pipe, and substitutes that pipe into the command as if it was a file.

I could just have easily have written the above as follows:

server2$ tar -zx -C /destination -f <(ssh server1 "tar -zc -C /srcdir ./path")

Or as follows:

server2$ ssh server1 "tar -zc -C /srcdir ./path" | tar -zx -C /destination

Or, you can save yourself some grief and just use rsync:

server1$ rsync -az ./path server2:/destination/

Finally, remember that compressing the data before transfer will reduce your bandwidth, but on a very fast connection, it may actually make the operation take more time. This is because your computer may not be able to compress fast enough to keep up: if compressing 100MB takes longer than it would take to send 100MB, then it's faster to send it uncompressed.

Alternately, you may want to consider piping to gzip yourself (rather than using the -z option) so that you can specify a compression level. It's been my experience that on fast network connections with compressible data, using gzip at level 2 or 3 (the default is 6) gives the best overall throughput in most cases. Like so:

server1$ tar -c ./path | gzip -2 | ssh server2 "cat > ~/file.tar.gz"

Solution 2:

I'd be tempted to rsync it over myself - it does compression and handles link loss well.


Solution 3:

If you just tar them up and nothing else this will waste tons of time with only minimal speed gain.

So simply taring up the files with the cvf switches will effectively cost the time it takes to read all the 55GB images and write them back to disk. (Effectively it will be even more time wasted since there will be an considerable overhead).

There is only one advantage you gain here, the overhead for uploading many files is being reduced. You might get faster transfer times if you compress the images (but since I believe they are already in a compressed format this won't be much help). Just more waste of computing time.

The biggest disadvantage from transfering a huge tar archiv over wire is that if something goes wrong it could mean you have to start over.

I would use that way:

md5sum /images/* > md5sum.txt
scp -r images/* user@host:/images/

On the new server

md5sum /images/* > md5sum_new.txt

And then just diff. And since scp supports compression on the fly there is no need for separate archives.

Edit

I'll keep the MD5 information since it was useful to the OP. But one comment hit me with new insight. So a bit of searching provided this useful piece of information. Please note that the subject here is SFTP not directly SCP.

In contrast to FTP, SFTP does add overhead to the transfer of files. As a file is transferred between client and server, it is broken up into smaller chunks called "packets." For example, suppose each packet is 32KB. The SFTP protocol does a checksum on each 32KB file as it is sent, and includes that checksum along with that packet. The receiver gets that packet and decrypts the data, and then verifies the checksum. The checksum itself is "stronger" than the CRC32 checksum. (Because SFTP uses a 128-bit or higher checksum, such as MD5 or SHA, and because this is done on each and every packet, there is a very granular integrity checking that is accomplished as part of the transfer.) Thus, the protocol itself is slower (because of the additional overhead), but the successful completion of a transfer means, de facto, that it has be transferred integrally and there is no need for an additional check.


Solution 4:

On top of Pacey's md5sum suggestion, I'd use the following:

On the destination: nc -w5 -l -p 4567 | tar -xvf -

Then on the source: tar -cvf - /path/to/source/ | nc -w5 destinationserver 4567

It's still a tar/untar, and there's no encryption, but it's direct to the other server. Start them both in tandem (-w5 gives you 5 seconds' grace.) and watch it go. If bandwidth is tight, add -z to the tar on both ends.


Solution 5:

One point - not all hosts have rsync and may hosts may well have different versions of tar. For this reason, one could recommend as a first port of call using the oft-neglected cpio.

You can cpio over ssh to do ad-hoc replication of file/directory structures between hosts. This way you have finer control over what gets sent over seeing as you need to "feed" cpio, nom-nom. It's also more argument-portable, cpio doesn't change much - this is an important point if you are looking after multiple hosts in a heterogeneous environment.

Example copying /export/home and subdirs to remote host:

cd /export/ find . home -print | cpio -oaV | ssh 10.10.10.10 'cd /export/home; cpio -imVd'

The above would copy the contents of /export/home and any subdirs to /export/home on the remote host.

Hope this helps.