Finding if a string contains any string in a collection

It is possible to speed it up significantly with Aho-Corasick algorithm.

You can build an Aho-Corasick automaton for a collection using O(total length of all strings in a collections) time and space. Then it will be possible to check if one of the strings in a collection is a substring of a given string S in O(S.length) time by traversing this automaton.


// Make a regex pattern (once only):
StringBuilder pattern = new StringBuilder();
for (String sought : searchCollection) {
    if (!Util.isNullOrEmpty(sought)) {
        if (pattern.length() != 0) {
            pattern.append('|');
        }
        pattern.append(Pattern.quote(sought));
    }
}
final Pattern PATTERN = Pattern.compile("(" + pattern + ")");

This creates a pattern of alternatives like "(abc|def|ghi)". You might consider a case-insensitive search.

And in the function containsAny:

Matcher m = PATTERN.matcher(searchString);
return m.find();

Regex compilation is relatively smart. It would be comparable to using a search tree of your collection of sought words: "agent" and "agitator" to ("ag", ("ent", "itator"))