Finding sparse files?

On systems (and file systems) supporting the SEEK_HOLE lseek flag (like your Ubuntu 12.04 on ext4 would) and assuming the value for SEEK_HOLE is 4 as it is on Linux:

if perl -le 'seek STDIN,0,4;$p=tell STDIN;
   seek STDIN,0,2; exit 1 if $p == tell STDIN'< the-file; then
  echo the-file is sparse
else
  echo the-file is not sparse
fi

That shell syntax is POSIX. The non-portable stuff in it are perl and that SEEK_HOLE.

lseek(SEEK_HOLE) seeks to the start of the first hole in the file, or the end of the file if no hole is found. Above we know the file is not sparse when the lseek(SEEK_HOLE) takes us to the end of the file (to the same place as lseek(SEEK_END)).

If you want to list the sparse files:

find . -type f ! -size 0 -exec perl -le 'for(@ARGV){open(A,"<",$_)or
  next;seek A,0,4;$p=tell A;seek A,0,2;print if$p!=tell A;close A}' {} +

The GNU find (since version 4.3.3) has -printf %S to report the sparseness of a file. It takes the same approach as frostschutz' answer in that it takes the ratio of disk usage vs file size, so is not guaranteed to report all sparse files (like when there's compression at filesystem level or where the space saved by the holes doesn't compensate for the filesystem infrastructure overhead or large extended attributes), but would work on systems that don't have SEEK_HOLE or file systems where SEEK_HOLE is not implemented. Here with GNU tools:

find . -type f ! -size 0 -printf '%S:%p\0' |
  awk -v RS='\0' -F : '$1 < 1 {sub(/^[^:]*:/, ""); print}'

(note that an earlier version of this answer didn't work properly when find expressed the sparseness as for instance 3.2e-05. Thanks to @flashydave's answer for bringing it to my attention)


A file is usually sparse when the number of allocated blocks is smaller than the file size (here using GNU stat as found on Ubuntu, but beware other systems may have incompatible implementations of stat).

if [ "$((`stat -c '%b*%B-%s' -- "$file"`))" -lt 0 ]
then
    echo "$file" is sparse
else
    echo "$file" is not sparse
fi

Variant with find: (stolen from Stephane)

find . -type f ! -size 0 -exec bash -c '
    for f do
        [ "$((`stat -c "%b*%B-%s" -- "$f"`))" -lt 0 ] && printf "%s\n" "$f";
    done' {} +

You'd usually put this in a shell script instead, then exec the shell script.

find . -type f ! -size 0 -exec ./sparsetest.sh {} +

Stephane Chazelas answer above doesn't take into account the fact that some sparse files with the find %S parameter report the ratio as floating point numbers like

9.31323e-09:./somedir/sparsefile.bin

These can be found in addition with

find . -type f ! -size 0 -printf '%S:%p\0' |
   sed -zn '/^\(0[^:]*:\)\|\([0-9.]\+e-.*:\)/p' |
   tr '\0' '\n'