Interview Question: Query - which sentences contain all of the words of a phrase

Maintain a HashMap that will map Strings to Set<Int>. The idea is to keep track of what sentences a given word appears in. We use a set instead of an array in order to support computing the intersection of two sets efficiently.

For each input sentence:

  • Tokenize it into words, and add the index of the current sentence to the Set corresponding to the current token.

For each query phrase:

  • Tokenize it into words.
  • Query for the Set of indices corresponding to each word
  • Take the intersection of all of these sets.

Time Complexity: Given that there are 10 words in each sentence, the cost of building the HashMap is O(10N log N). The cost of each query is O(10 * log(N)).