Fast search algorithm with std::vector<std::string>

Use binary_search after sorting the vector

  1. std::sort( serverList.begin() , serverList.end() )
  2. std::lower_bound(serverList.begin() , serverList.end() , valuetoFind) to find first matching
  3. Use std::equal_range if you want to find all matching elements

The lower_bound & equal_range search because it is binary is logarithmic compared to your search that is O(N)


Basically, you're asking if it's possible to check all elements for a match, without checking all elements. If there is some sort of external meta-information (e.g. the data is sorted), it might be possible (e.g. using binary search). Otherwise, by its very nature, to check all elements, you have to check all elements.

If you're going to do many such searches on the list and the list doesn't vary, you might consider calculating a second table with a good hash code of the entries; again depending on the type of data being looked up, it could be more efficient to calculate the hash code of the index, and compare hash codes first, only comparing the strings if the hash codes were equal. Whether this is an improvement or not largely depends on the size of the table and the type of data in it. You might also, be able to leverage off knowledge about the data in the strings; if they are all URL's, for example, mostly starting with "http://www.", starting the comparison at the tenth character, and only coming back to compare the first 10 if all of the rest are equal, could end up with a big win.

With regards to finding substrings, you can use std::search for each element:

for ( auto iter = serverList.begin();
        iter != serverList.end();
        ++ iter ) {
    if ( std::search( iter->begin(), iter->end(),
                      index.begin(), index.end() ) != iter->end() ) {
        indexResult.push_back( iter - serverList.begin() );
    }
}

Depending on the number of elements being searched and the lengths of the strings involved, it might be more efficient to use something like BM search, however, precompiling the search string to the necessary tables before entering the loop.


If you make the container a std::map instead of a std::vector, the underlying data structure used will be one that is optimized for doing keyword searches like this.

If you instead use a std::multimap, the member function equal_range() will return a pair of iterators covering every match in the map. That sounds to me like what you want.

A smart commenter below points out that if you don't actually store any more infomation than the name (the search key), then you should probably instead use a std::multiset.

Tags:

C++

Search

Vector