How to get folder size ignoring hard links?

If you specifically want the size of the files that are present under hourly.2 but not under hourly.1, you can obtain it a little indirectly with du. If du processes the same file more than once (even under different names, i.e. hard links), it only counts the file the first time. So what du hourly.1 hourly.2 reports for hourly.2 is the size you're looking for. Thus:

du -ks hourly.1 hourly.2 | sed -n '2s/[^0-9]*//p'

(Works on any POSIX system and most other Unix variants. Assumes that the directory name hourly.1 doesn't contain any newline.)


As @Gilles says, since du counts only the first of all hardlinks pointing to the same inode it encounters, you can give it directories in a row:

$ du -hc --max-depth=0 dirA dirB
29G /hourly.1
 1G /hourly.2
30G total

I.e. any file in 'hourly.2' referencing an inode (aka "real" file) already referenced in 'hourly.1', will not be counted.


Total size in bytes of all files in hourly.2 which have only one link:

$ find ./hourly.2 -type f -links 1 -printf "%s\n" | awk '{s=s+$1} END {print s}'

From find man-page:

   -links n
          File has n links.

To get the sum in kilobytes instead of bytes, use -printf "%k\n"

To list files with different link counts, play around with find -links +1 (more than one link), find -links -5 (less than five links) and so on.