rsync --sparse does transfer whole data

Take a look a this discussion, specifically, this answer.

It seems that the solution is to do a rsync --sparse followed by a rsync --inplace.

On the first, --sparse, call, also use --ignore-existing to prevent already transferred sparse files to be overwritten, and -z to save network resources.

The second call, --inplace, should update only modified chunks. Here, compression is optional.

Also see this post.

Update

I believe the suggestions above won't solve your problem. I also believe that rsync is not the right tool for the task. You should search for other tools which will give you a good balance between network and disk I/O efficiency.

Rsync was designed for efficient usage of a single resource, the network. It assumes reading and writing to the network is much more expensive than reading and writing the source and destination files.

We assume that the two machines are connected by a low-bandwidth high-latency bi-directional communications link. The rsync algorithm, abstract.

The algorithm, summarized in four steps.

  1. The receiving side β sends checksums of blocks of size S of the destination file B.
  2. The sending side α identify blocks that match in the source file A, at any offset.
  3. α sends β a list of instructions made of either verbatim, non-matching, data, or matching block references.
  4. β reconstructs the whole file from those instructions.

Notice that rsync normally reconstructs the file B as a temporary file T, then replaces B with T. In this case it must write the whole file.

The --inplace does not relieve rsync from writing blocks matched by α, as one could imagine. They can match at different offsets. Scanning B a second time to take new data checksums is prohibitive in terms of performance. A block that matches in the same offset it was read on step one could be skipped, but rsync does not do that. In the case of a sparse file, a null block of B would match for every null block of A, and would have to be rewritten.

The --inplace just causes rsync to write directly to B, instead of T. It will rewrite the whole file.


The latest version of rsync can handle --sparse and --inplace together! I found the following github entry from 2016: https://github.com/tuna/rsync/commit/f3873b3d88b61167b106e7b9227a20147f8f6197