How do I find which files are missing from a list?

You could use stat to determine if a file exists on the file system.

You should use the built in shell functions to test if files exist.

while read f; do
   test -f "$f" || echo $f
done < file_list

The "test" is optional and the script will actually work without it, but I left it there for readability.

Edit: If you really have no option but to work for a list of filenames without paths, I suggest you build a list of files once with find, then iterate over it with grep to figure out which files are there.

find -type f /dst > $TMPFILE
while read f; do
    grep -q "/$f$" $TIMPFILE || echo $f
done < file_list

Note that:

  • the file list only includes files not directories,
  • the slash in the grep match pattern is so we compare full file names not partials,
  • and the last '$' in the search pattern is to match the end of the line so you don't get directory matches, only full file name patches.

find considers finding nothing a special case of success (no error occurred). A general way to test whether files match some find criteria is to test whether the output of find is empty. For better efficiency when there are matching files, use -quit on GNU find to make it quit at the first match, or head (head -c 1 if available, otherwise head -n 1 which is standard) on other systems to make it die of a broken pipe rather than produce long output.

while IFS= read -r name; do
  [ -n "$(find . -name "$name" -print | head -n 1)" ] || printf '%s\n' "$name"
done <file_list

In bash ≥4 or zsh, you don't need the external find command for a simple name match: you can use **/$name. Bash version:

shopt -s nullglob
while IFS= read -r name; do
  set -- **/"$name"
  [ $# -ge 1 ] || printf '%s\n' "$name"
done <file_list

Zsh version on a similar principle:

while IFS= read -r name; do
  set -- **/"$name"(N)
  [ $# -ge 1 ] || print -- "$name"
done <file_list

Or here's a shorter but more cryptic way of testing the existence of a file matching a pattern. The glob qualifier N makes the output empty if there is no match, [1] retains only the first match, and e:REPLY=true: changes each match to expand to 1 instead of the matched file name. So **/"$name"(Ne:REPLY=true:[1]) false expands to true false if there is a match, or to just false if there is no match.

while IFS= read -r name; do
  **/"$name"(Ne:REPLY=true:[1]) false || print -- "$name"
done <file_list

It would be more efficient to combine all your names into one search. If the number of patterns is not too large for your system's length limit on a command line, you can join all the names with -o, make a single find call, and post-process the output. If none of the names contain shell metacharacters (so that the names are find patterns as well), here's a way to post-process with awk (untested):

set -o noglob; IFS='
'
set -- $(<file_list sed -e '2,$s/^/-o\
/')
set +o noglob; unset IFS
find . \( "$@" \) -print | awk -F/ '
    BEGIN {while (getline <"file_list") {found[$0]=0}}
    wanted[$0]==0 {found[$0]=1}
    END {for (f in found) {if (found[f]==0) {print f}}}
'

Another approach would be to use Perl and File::Find, which makes it easy to run Perl code for all the files in a directory.

perl -MFile::Find -l -e '
    %missing = map {chomp; $_, 1} <STDIN>;
    find(sub {delete $missing{$_}}, ".");
    print foreach sort keys %missing'

An alternate approach is to generate a list of file names on both sides and work on a text comparison. Zsh version:

comm -23 <(<file_list sort) <(print -rl -- **/*(:t) | sort)

Tags:

Find