tar + rsync + untar. Any speed benefit over just rsync?

When you send the same set of files, rsync is better suited because it will only send differences. tar will always send everything and this is a waste of resources when a lot of the data are already there. The tar + rsync + untar loses this advantage in this case, as well as the advantage of keeping the folders in-sync with rsync --delete.

If you copy the files for the first time, first packeting, then sending, then unpacking (AFAIK rsync doesn't take piped input) is cumbersome and always worse than just rsyncing, because rsync won't have to do any task more than tar anyway.

Tip: rsync version 3 or later does incremental recursion, meaning it starts copying almost immediately before it counts all files.

Tip2: If you use rsync over ssh, you may also use either tar+ssh

tar -C /src/dir -jcf - ./ | ssh user@server 'tar -C /dest/dir -jxf -'

or just scp

scp -Cr srcdir user@server:destdir

General rule, keep it simple.

UPDATE:

I've created 59M demo data

mkdir tmp; cd tmp
for i in {1..5000}; do dd if=/dev/urandom of=file$i count=1 bs=10k; done

and tested several times the file transfer to a remote server (not in the same lan), using both methods

time rsync -r  tmp server:tmp2

real    0m11.520s
user    0m0.940s
sys     0m0.472s

time (tar cf demo.tar tmp; rsync demo.tar server: ; ssh server 'tar xf demo.tar; rm demo.tar'; rm demo.tar)

real    0m15.026s
user    0m0.944s
sys     0m0.700s

while keeping separate logs from the ssh traffic packets sent

wc -l rsync.log rsync+tar.log 
   36730 rsync.log
   37962 rsync+tar.log
   74692 total

In this case, I can't see any advantage in less network traffic by using rsync+tar, which is expected when the default mtu is 1500 and while the files are 10k size. rsync+tar had more traffic generated, was slower for 2-3 seconds and left two garbage files that had to be cleaned up.

I did the same tests on two machines on the same lan, and there the rsync+tar did much better times and much much less network traffic. I assume cause of jumbo frames.

Maybe rsync+tar would be better than just rsync on a much larger data set. But frankly I don't think it's worth the trouble, you need double space in each side for packing and unpacking, and there are a couple of other options as I've already mentioned above.


rsync also does compression. Use the -z flag. If running over ssh, you can also use ssh's compression mode. My feeling is that repeated levels of compression is not useful; it will just burn cycles without significant result. I'd recommend experimenting with rsync compression. It seems quite effective. And I'd suggest skipping usage of tar or any other pre/post compression.

I usually use rsync as rsync -abvz --partial....


I had to back up my home directory to NAS today and ran into this discussion, thought I'd add my results. Long story short, tar'ing over the network to the target file system is way faster in my environment than rsyncing to the same destination.

Environment: Source machine i7 desktop using SSD hard drive. Destination machine Synology NAS DS413j on a gigabit lan connection to the Source machine.

The exact spec of the kit involved will impact performance, naturally, and I don't know the details of my exact setup with regard to quality of network hardware at each end.

The source files are my ~/.cache folder which contains 1.2Gb of mostly very small files.

1a/ tar files from source machine over the network to a .tar file on remote machine

$ tar cf /mnt/backup/cache.tar ~/.cache

1b/ untar that tar file on the remote machine itself

$ ssh admin@nas_box
[admin@nas_box] $ tar xf cache.tar

2/ rsync files from source machine over the network to remote machine

$ mkdir /mnt/backup/cachetest
$ rsync -ah .cache /mnt/backup/cachetest

I kept 1a and 1b as completely separate steps just to illustrate the task. For practical applications I'd recommend what Gilles posted above involving pipeing tar output via ssh to an untarring process on the receiver.

Timings:

1a - 33 seconds

1b - 1 minutes 48 seconds

2 - 22 minutes

It's very clear that rsync performed amazingly poorly compared to a tar operation, which can presumably be attributed to both the network performance mentioned above.

I'd recommend anyone who wants to back up large quantities of mostly small files, such as a home directory backup, use the tar approach. rsync seems a very poor choice. I'll come back to this post if it seems I've been inaccurate in any of my procedure.

Nick

Tags:

Rsync

Tar