How to find total filesize grouped by extension

On a GNU system:

find . -name '?*.*' -type f -printf '%b.%f\0' |
  awk -F . -v RS='\0' '
    {s[$NF] += $1; n[$NF]++}
    END {for (e in s) printf "%15d %4d %s\n", s[e]*512, n[e], e}' |
  sort -n

Or the same with perl, avoiding the -printf extension of GNU find (still using a GNU extension, -print0, but this one is more widely supported nowadays):

find . -name '?*.*' -type f -print0 |
  perl -0ne '
    if (@s = stat$_){
      ($ext = $_) =~ s/.*\.//s;
      $s{$ext} += $s[12];
      $n{$ext}++;
    }
    END {
      for (sort{$s{$a} <=> $s{$b}} keys %s) {
        printf "%15d %4d %s\n",  $s{$_}<<9, $n{$_}, $_;
      }
    }'

It gives an output like:

          12288    1 pnm
          16384    4 gif
         204800    2 ico
        1040384   17 jpg
        2752512   83 png

If you want KiB, MiB... suffixes, pipe to numfmt --to=iec-i --suffix=B.

%b*512 gives the disk usage, but note that if files are hard linked several times, they will be counted several times so you may see a discrepancy with what du reports.


Here is another solution:

find . -type f |  egrep -o "\.[a-zA-Z0-9]+$" | sort -u | xargs -I '%' find . -type f -name "*%" -exec du -ch {} + -exec echo % \; | egrep "^\.[a-zA-Z0-9]+$|total$" | uniq | paste - -

The part that gets the extensions is:

find . -type f |  egrep -o "\.[a-zA-Z0-9]+$" | sort -u

Next search for the files with an extension and print it on the screen as well:

xargs -I '%' find . -type f -name "*%" -exec du -ch {} + -exec echo % \;

Next we want to keep the extension and the total:

egrep "^\.[a-zA-Z0-9]+$|total$" | uniq

and keep it on the same line:

paste - -

Not as nice as Stephane's solution, but you could try

find . -type f -name "*.png" -print0 | xargs -0r du -ch | tail -n1

where you have to run this for each type of files.