Recursively list all directories that contain one or more jpg image files

Assuming JPEG image files have the suffix .jpg:

find "$HOME" -type f -name '*.jpg' \
    -exec sh -c 'for d; do dirname "$d"; done' sh {} + | sort -u -o jpeg_dirs.txt

This relies on you not having funky directory names with newlines in their names.

With GNU find:

find "$HOME" -type f -name '*.jpg' -printf '%h\n' | sort -u -o jpeg_dirs.txt

These find commands will find all JPEG images under your home directory and print the names of the directories where they were found. The sort -u will take this list of directory names, sort it, and remove duplicates. The result will be written to the file jpeg_dirs.txt in the current directory.


Looking back at this in early 2021 (3.3 years later) I cringe a bit because my solution above, albeit not wrong per se, is a bit backwards. It also makes the obvious assumption about "nice filenames" (no newlines).

When you're using find to search for directories, don't search for regular files as I did above; actually search for directories. Once we have the directories, we can look in each of them and see if the is a file matching *.jpg or *.JPG (further filename suffixes are easy to add):

find "$HOME" -type d -exec bash -O nullglob -O dotglob -O extglob -c '
    for dirpath do
        set -- "$dirpath"/*.@(jpg|JPG)
        [[ "$#" -gt 0 ]] && printf "%s\n" "$dirpath"
    done' bash {} +

This peeks into each directory from your home directory down and tries to expand the globbing pattern *.@(jpg|JPG) in each. This pattern, which also could have been written as two separate patterns, *.jpg and *.JPG, matches all the files that we're looking for. If one name matches, we assume that this is a directory that we want to output the name of. This will give false positives for directories that contain only sub directories with these suffixes.

The shell options that we run our internal bash script with allows us to match hidden names (dotglob), allows the globbing pattern to disappear completely if it doesn't match anything rather than remain unexpanded (nullglob), and allows us the use of the ksh-inspired extended globbing pattern @(...|...).

Using the zsh shell:

typeset -U list=(~/**/*.(jpg|JPG)(.DN:h))
print -rC1 $list

This creates an array variable, list, that has the property that it only stores unique elements. It is initialized to the result of expanding a filename globbing pattern. The pattern matches all JPEG image files in or below the home directory, and the :h at the end removes the actual filename from the generated pathnames. The . makes the pattern only match regular files, and D and N acts like dotglob and nullglob in bash.


A simple way is to list all the .jpg files, then strip off the base names of the files (the part after the final slash), and remove duplicates. You can use sed to strip the part of each line after the final slash. There's a command to remove duplicates, which is called uniq, but it assumes sorted input; if you're need to sort anyway, you can let sort do the uniquification.

find ~Mike -iname '*.jpg' | sed 's!/[^/]*$!!' | sort -u >directories_with_jpeg_files.txt

This assumes that none of the directories or files involved have a newline in their name. File names with newlines do not appear in normal circumstances, but do beware if the file names may have been chosen by a hostile person (e.g. if you're processing files that have been uploaded to a server and the uploader can choose the file name).

If there are directories containing a lot of JPEG files and not many directories containing no JPEG file, this method spends a lot of time reporting then redundant files. There is no way to tell find to shortcut a directory once it's found something in it. But you can restrict find to directories and tell it to search for a JPEG file in each directory. This increases the cost for directories that don't contain JPEG files, however, so it can have poor performance if there are many JPEGless directories.

find ~Mike -type d -exec sh -c '
    for d do
      set -- "$d/*.[Jj][Pp][Gg]";
      if [ -e "$1" ]; then printf %s\\n "$d"; fi
    done
' sh {} + | sort -u >directories_with_jpeg_files.txt

Alternatively, in zsh, you can use the ** wildcard to traverse directories recursively, (#i) to match the following path component case-insensitively to make the pattern **/(#i)*.jpg matching *.jpg and *.JPG (and .Jpg and so on) in a whole directory tree. Add the history modifier h in a glob qualifier to extract the directory part. Stuff this into an array variable dirs=(…) and extract the unique elements of this array with the u parameter expansion flag.

set -o extendedglob # for (#i); best in ~/.zshrc
dirs=(~Mike/**/(#i)*.jpg(:h))
print -lr -- ${(u)dirs} >directories_with_jpeg_files.txt

The equivalent of the check-per-directory method above is to use the e glob qualifier.

print -lr ~Mike/**/*(/e\''set -- $REPLY/*.(#i)jpg(N[1]); (($# != 0))'\') >directories_with_jpeg_files.txt

Tags:

Shell

Find

Files