recursive statistics on file types in directory?

You could use find and uniq for this, e.g.:

$ find . -type f | sed 's/.*\.//' | sort | uniq -c
   16 avi
   29 jpg
  136 mp3
    3 mp4

Command explanation

  • find recursively prints all filenames
  • sed deletes from every filename the prefix until the file extension
  • uniq assumes sorted input
    • -c does the counting (like a histogram).

With zsh:

print -rl -- **/?*.*(D.:e) | uniq -c |sort -n

The pattern **/?*.* matches all files that have an extension, in the current directory and its subdirectories recursively. The glob qualifier D let zsh traverse even hidden directories and consider hidden files, . selects only regular files. The history modifier retains only the file extension. print -rl prints one match per line. uniq -c counts consecutive identical items (the glob result is already sorted). The final call to sort sorts the extensions by use count.


This one-liner seems to be a fairly robust method:

find . -type f -printf '%f\n' | sed -r -n 's/.+(\..*)$/\1/p' | sort | uniq -c

The find . -type f -printf '%f\n' prints the basename of every regular file in the tree, with no directories. That eliminates having to worry about directories which may have .'s in them in your sed regex.

The sed -r -n 's/.+(\..*)$/\1/p' replaces the incoming filename with only its extension. E.g., .somefile.ext becomes .ext. Note the initial .+ in the regex; this results in any match needing at least one character before the extension's .. This prevents filenames like .gitignore from being treated as having no name at all and the extension '.gitignore', which is probably what you want. If not, replace the .+ with a .*.

The rest of the line is from the accepted answer.

Edit: If you want a nicely-sorted histogram in Pareto chart format, just add another sort to the end:

find . -type f -printf '%f\n' | sed -r -n 's/.+(\..*)$/\1/p' | sort | uniq -c | sort -bn

Sample output from a built Linux source tree:

    1 .1992-1997
    1 .1994-2004
    1 .1995-2002
    1 .1996-2002
    1 .ac
    1 .act2000
    1 .AddingFirmware
    1 .AdvancedTopics
    [...]
 1445 .S
 2826 .o
 2919 .cmd
 3531 .txt
19290 .h
23480 .c