Compare directories but not content of files

rsync, by default, compares only file metadata. that means timestamp, size, and attributes. among others. but not content of files.

rsync -n -a -i --delete source/ target/

explanation:

  • -n do not actually copy or delete <-- THIS IS IMPORTANT!!1
  • -a compare all metadata of file like timestamp and attributes
  • -i print one line of information per file
  • --delete also report files which are not in source

note: it is important to append the directory names with a slash. this is an rsync thing.

if you also want to see lines printed for files that are identical then provide -i twice

rsync -n -a -ii --delete source/ target/

example output:

*deleting   removedfile   (file in target but not in source)
.d..t...... ./            (directory with different timestamp)
>f.st...... modifiedfile  (file with different size and timestamp)
>f+++++++++ newfile       (file in source but not in target)
.f          samefile      (file that has same metadata. only with -ii)

remember that rsync only compares metadata. that means if the file content changed but metadata stayed the same then rsync will report that file is same. this is an unlikely scenario. so either trust that when metadata is same then data is same, or you have to compare file data bit by bit.

bonus: for progress information see here: Estimate time or work left to finish for rsync?


Use the -q (--brief) option with diff -r (diff -qr). From the info page for GNU diff:

1.6 Summarizing Which Files Differ

When you only want to find out whether files are different, and you don't care what the differences are, you can use the summary output format. In this format, instead of showing the differences between the files, diff' simply reports whether files differ. The--brief' (`-q') option selects this output format.

This format is especially useful when comparing the contents of two directories. It is also much faster than doing the normal line by line comparisons, because `diff' can stop analyzing the files as soon as it knows that there are any differences.

This will not compare line by line, but rather the file as a whole, which greatly speeds up the processor (what' you're looking for).


Here's a quick python script that will check that the filenames, mtimes, and file sizes are all the same:

import os
import sys

def getStats(path):
    for pathname, dirnames, filenames in os.walk(path):
        for filename in ( os.path.join(pathname, x) for x in filenames ):
            stat = os.stat(filename)
            yield filename[len(path):], stat.st_mtime, stat.st_size

sys.exit(tuple(getStats(sys.argv[1])) != tuple(getStats(sys.argv[2])))