How to find all files containing various strings from a long list of string combinations?

I'd use perl, something like:

perl -MFile::Find -MClone=clone -lne '
  # parse the strings.txt input, here looking for the sequences of
  # 0 or more characters (.*?) in between two " characters
  for (/"(.*?)"/g) {
    # @needle is an array of associative arrays whose keys
    # are the "strings" for each line.
    $needle[$n]{$_} = undef;
  }
  $n++;

  END{
    sub wanted {
      return unless -f; # only regular files
      my $needle_clone = clone(\@needle);
      if (open FILE, "<", $_) {
        LINE: while (<FILE>) {
          # read the file line by line
          for (my $i = 0; $i < $n; $i++) {
            for my $s (keys %{$needle_clone->[$i]}) {
              if (index($_, $s)>=0) {
                # if the string is found, we delete it from the associative
                # array.
                delete $needle_clone->[$i]{$s};
                unless (%{$needle_clone->[$i]}) {
                  # if the associative array is empty, that means we have
                  # found all the strings for that $i, that means we can
                  # stop processing, and the file matches
                  print $File::Find::name;
                  last LINE;
                }
              }
            }
          }
        }
        close FILE;
      }
    }
    find(\&wanted, ".")
  }' /path/to/strings.txt

That means we minimize the number of string searches.

Here, we're processing the files line by line. If the files are reasonably small, you could process them as a whole which would simplify it a bit and might improve performance.

Note that it does expect the list file to be in the:

 "surveillance data" "surveillance technology" "cctv camera"
 "social media" "surveillance techniques" "enforcement agencies"
 "social control" "surveillance camera" "social security"
 "surveillance data" "security guards" "social networking"
 "surveillance mechanisms" "cctv surveillance" "contemporary surveillance"

format, with a number (doesn't have to be 3) of quoted (with double quote) strings on each line. The quoted strings cannot contain double quote characters themselves. The double quote character is not part of the text being searched. That is if the list file contained:

"A" "B"
"1" "2" "3"

that would report the path of all the regular files in the current directory and below that contain either

  • both A and B
  • or (being not an exclusive or) all of 1, 2 and 3

anywhere in them.