Behavior of rsync with file that's still being written?

If Apache is writing a file of some kind to one place and has not completed writing it and then rsync kicks in, rsync will copy whatever is sitting there.

Meaning if Apache is dealing with a 5MB file, only 2MB is written and rsync kicks in, the partial 2MB file will be copied. So that file would seem like it is “corrupted” on the destination server.

Depending on the size of the files you are using, you can use the --inplace option in rsync to do the following:

This option changes how rsync transfers a file when the file's data needs to be updated: instead of the default method of creating a new copy of the file and moving it into place when it is complete, rsync instead writes the updated data directly to the destination file.

The benefit of this is if a 5MB file only has 2MB copied on the first run, the next run will pick up at 2MB and continue to copy the file until the full 5MB is in place.

The negative is that it could create a situation where someone is accessing the web server while a file is being copied and then they would see a partial file. In my opinion rsync works best in it’s default behavior of caching an “invisible” file and then moving it into place right away. But --inplace is good for scenarios where large files and bandwidth constraints might stand in the way of a large file being easily copied from square one.

That said you do state this; emphasis is mine:

Every five minutes has cron run rsync…

So I assume you have some bash script in place to manage this cron job? Well, the thing is rsync is smart enough to only copy the files that need to be copied. And if you have a script that runs every 5 minutes it appears you are trying to avoid having rsync step on each other if it goes faster. Meaning, if you ran it every minute, there is a risk that one or more of the rsync processes would still be running due to file size or network speed and the next process would just be in competition with it; a racing condition.

One way to avoid this is to wrap your whole rsync command in a bash script that checks for a file lock; below is a boilerplate bash script framework I use for cases like this.

Note that some people will recommend using flock but since flock is not installed on some systems I use—and I jump between Ubuntu (which has it) and Mac OS X (which does not) a lot—I use this simple framework without any real issue:


if mkdir ${LOCK_DIR} 2>/dev/null; then
  # If the ${LOCK_DIR} doesn't exist, then start working & store the ${PID_FILE}
  echo $$ > ${PID_FILE}

  echo "Hello world!"

  rm -rf ${LOCK_DIR}
  if [ -f ${PID_FILE} ] && kill -0 $(cat ${PID_FILE}) 2>/dev/null; then
    # Confirm that the process file exists & a process
    # with that PID is truly running.
    echo "Running [PID "$(cat ${PID_FILE})"]" >&2
    # If the process is not running, yet there is a PID file--like in the case
    # of a crash or sudden reboot--then get rid of the ${LOCK_DIR}
    rm -rf ${LOCK_DIR}

The idea is that general core—where I have echo "Hello world!"—is where the heart of your script is. The rest of it is basically a locking mechanism/logic based on mkdir. A good explanation of the concept is in this answer:

mkdir creates a directory if it doesn't exist yet, and if it does, it sets an exit code. More importantly, it does all this in a single atomic action making it perfect for this scenario.

So in the case of your rsync process, I would recommend using this script by just changing the echo command to your rsync command. Also, change the LOCK_NAME to something like RSYNC_PROCESS and then you are good to go.

Now with your rsync wrapped in this script, you can set the cron job to run every minute without any risk of a racing condition where two or more rsync processes are fighting to do the same thing. This will allow you to increase the speed or rsync updates which will not eliminate the issue of partial files being transferred, but it will help speed up the overall process so the full file can be properly be copied over at some point.

Yes - and the file might be corrupted if rsync is reading the file at the same time the file is being written to.

You can try this:

You can also script it with lsof:

lsof /path/to file

An exit code of 0 means that the file is in use, and exit code of 1 means there's no activity on that file.