Finding all files with a given extension whose base name is the name of the parent directory

With GNU find:

find . -regextype egrep -regex '.*/([^/]+)/\1\.pdf'
  • -regextype egrep use egrep style regex.
  • .*/ match grand parent directires.
  • ([^/]+)/ match parent dir in a group.
  • \1\.pdf use backreference to match file name as parent dir.

update

One (myself for one) might think that .* is greedy enough, it's unnecessary to exclude / from parent matching:

find . -regextype egrep -regex '.*/(.+)/\1\.pdf'

Above command won't work well, because it mathches ./a/b/a/b.pdf:

  • .*/ matches ./
  • (.+)/ matches a/b/
  • \1.pdf matches a/b.pdf

The traditional loop variant of the find .. -exec sh -c '' to use the shell constructs to match the basename and the immediate path above would be to do below.

find foo/ -name '*.pdf' -exec sh -c '
    for file; do 
        base="${file##*/}"
        path="${file%/*}"
        if [ "${path##*/}" =  "${base%.*}" ]; then
            printf "%s\n" "$file" 
        fi
    done' sh {} +

To breakdown the individual parameter expansions

  • file contains the full path of the .pdf file returned from the find command
  • "${file##*/}" contains only the part after the last / i.e. only the basename of the file
  • "${file%/*}" contains the path up to the final / i.e. except the basename portion of the result
  • "${path##*/}" contains the part after the last / from the path variable, i.e. the immediate folder path above the basename of the file
  • "${base%.*}" contains the part of the basename with the .pdf extension removed

So if the basename without extension matches with the name of the immediate folder above, we print the path.


The reverse of Inian's answer, i.e. look for directories, and then see whether they hold a file with a particular name.

The following prints the pathnames of the found files relative to the directory foo:

find foo -type d -exec sh -c '
    for dirpath do
        pathname="$dirpath/${dirpath##*/}.pdf"
        if [ -f "$pathname" ]; then
            printf "%s\n" "$pathname"
        fi
    done' sh {} +

${dirpath##*/} will be replaced by the filename portion of the directory path, and could be replaced by $(basename "$dirpath").

For people who like the short-circuit syntax:

find foo -type d -exec sh -c '
    for dirpath do
        pathname="$dirpath/${dirpath##*/}.pdf"
        [ -f "$pathname" ] && printf "%s\n" "$pathname"
    done' sh {} +

The benefit of doing it this way is that you may have more PDF files than directories. The number of tests involved are reduced if one restrict the query by the smaller number (the number of directories).

For example, if a single directory contains 100 PDF files, this would only try to detect one of them rather than testing the names of all 100 files against that of the directory.

Tags:

Find