Calculating total file size by extension in shell

Solution 1:

For any given extension you an use

find /path -name '*.frq' -exec ls -l {} \; | awk '{ Total += $5} END { print Total }'

to get the total file size for that type.

And after some thinking


ftypes=$(find . -type f | grep -E ".*\.[a-zA-Z0-9]*$" | sed -e 's/.*\(\.[a-zA-Z0-9]*\)$/\1/' | sort | uniq)

for ft in $ftypes
    echo -n "$ft "
    find . -name "*${ft}" -exec ls -l {} \; | awk '{total += $5} END {print total}'

Which will output the size in bytes of each file type found.

Solution 2:

With bash version4, you just need to call find, ls and awk not necessary:

declare -A ary

while IFS=$'\t' read name size; do 
  ((ary[$ext] += size))
done < <(find . -type f  -printf "%f\t%s\n")

for key in "${!ary[@]}"; do 
  printf "%s\t%s\n" "$key" "${ary[$key]}"

Solution 3:

Every second column splited by . and last part (extension) saved in array.


find . -type f -printf "%s\t%f\n" | awk '
 split($2, ext, ".")
 e = ext[length(ext)]
 size[e] += $1

 for(i in size)
   print size[i], i
}' | sort -n

then you got every extensions total size in bytes.

60055 gemspec
321991 txt
2075312 html
2745143 rb
13387264 gem
47196526 jar