Fastest way to search string in javascript

indexOf() is most probably the fastest just keep in mind you need to search for two possible cases:

var existingEmails = "email1, email2, ...";
var newEmail = "[email protected]";
var exists = (existingEmails.indexOf(newEmail + " ") >= 0) || (existingEmails.indexOf(" " + newEmail ) > 0);

Instead of looking for the fastest solution, you first need to make sure that you’re actually having a correct solution. Because there are four cases an e-mail address can appear and a naive search can fail:

  1. Alone: [email protected]
  2. At the begin: [email protected] ...
  3. At the end: ... [email protected]
  4. In between: ... [email protected] ...

Now let’s analyze each variant:

  1. To allow arbitrary input, you will need to escape the input properly. You can use the following method to do so:

    RegExp.quote = function(str) {
        return str.toString().replace(/(?=[.?*+^$[\]\\(){}-])/g, "\\");
    };
    

    To match all four cases, you can use the following pattern:

    /(?:^|\ )user@example\.com(?![^\ ])/
    

    Thus:

    var inList = new RegExp("(?:^| )" + RegExp.quote(needle) + "(?![^ ])").test(haystack);
    
  2. Using indexOf is a little more complex as you need to check the boundaries manually:

    var pos = haystack.indexOf(needle);
    if (pos != -1 && (pos != 0 && haystack.charAt(pos-1) !== " " || haystack.length < (pos+needle.length) && haystack.charAt(pos+needle.length) !== " ")) {
        pos = -1;
    }
    var inList = pos != -1;
    
  3. This one is rather quite simple:

    var dict = {};
    haystack.match(/[^\ ]+/g).map(function(match) { dict[match] = true; });
    var inList = dict.hasOwnProperty(haystack);
    

Now to test what variant is the fastest, you can do that at jsPerf.


The answer is: It depends.

  • It depends on what you actually want to measure.
  • It depends on the relationship between how many you're searching for vs. how many you're searching.
  • It depends on the JavaScript implementation. Different implementations usually have radically different performance characteristics. This is one of the many reasons why the rule "Don't optimize prematurely" applies especially to cross-implementation JavaScript.

...but provided you're looking for a lot fewer than you have in total, it's probably String#indexOf unless you can create the dictionary once and reuse it (not just this one loop of looking for X entries, but every loop looking for X entries, which I tend to doubt is your use-case), in which case that's hands-down faster to build the 500-key dictionary and use that.

I put together a test case on jsperf comparing the results of looking for five strings buried in a string containing 500 space-delimited, unique entries. Note that that jsperf page compares some apples and oranges (cases where we can ignore setup and what kind of setup we're ignoring), but jsperf was being a pain about splitting it and I decided to leave that as an exercise for the reader.

In my tests of what I actually think you're doing, Chrome, Firefox, IE6, IE7 and IE9 did String#indexOf fastest. Opera did RegExp alternation fastest. (Note that IE6 and IE7 don't have Array#indexOf; the others do.) If you can ignore dictionary setup time, then using a dictionary is the hands-down winner.

Here's the prep code:

// ==== Main Setup
var toFind = ["aaaaa100@zzzzz", "aaaaa200@zzzzz", "aaaaa300@zzzzz", "aaaaa400@zzzzz", "aaaaa500@zzzzz"];
var theString = (function() {
 var m, n;

 m = [];
 for (n = 1; n <= 500; ++n) {
  m.push("aaaaa" + n + "@zzzzz");
 }
 return m.join(" ");
})();

// ==== String#indexOf (and RegExp) setup for when we can ignore setup
var preppedString = " " + theString + " ";

// ==== RegExp setup for test case ignoring RegExp setup time
var theRegExp = new RegExp(" (?:" + toFind.join("|") + ") ", "g");

// ==== Dictionary setup for test case ignoring Dictionary setup time
var theDictionary = (function() {
 var dict = {};
 var index;
 var values = theString.split(" ");
 for (index = 0; index < values.length; ++index) {
  dict[values[index]] = true;
 }
 return dict;
})();

// ==== Array setup time for test cases where we ignore array setup time
var theArray = theString.split(" ");

The String#indexOf test:

var index;
for (index = 0; index < toFind.length; ++index) {
 if (theString.indexOf(toFind[index]) < 0) {
  throw "Error";
 }
}

The String#indexOf (ignore setup) test, in which we ignore the (small) overhead of putting spaces at either end of the big string:

var index;
for (index = 0; index < toFind.length; ++index) {
 if (preppedString.indexOf(toFind[index]) < 0) {
  throw "Error";
 }
}

The RegExp alternation test:

// Note: In real life, you'd have to escape the values from toFind
// to make sure they didn't have special regexp chars in them
var regexp = new RegExp(" (?:" + toFind.join("|") + ") ", "g");
var match, counter = 0;
var str = " " + theString + " ";
for (match = regexp.exec(str); match; match = regexp.exec(str)) {
 ++counter;
}
if (counter != 5) {
 throw "Error";
}

The RegExp alternation (ignore setup) test, where we ignore the time it takes to set up the RegExp object and putting spaces at either end of the big string (I don't think this applies to your situation, the addresses you're looking for would be static):

var match, counter = 0;
for (match = theRegExp.exec(preppedString); match; match = theRegExp.exec(preppedString)) {
 ++counter;
}
if (counter != 5) {
 throw "Error";
}

The Dictionary test:

var dict = {};
var index;
var values = theString.split(" ");
for (index = 0; index < values.length; ++index) {
 dict[values[index]] = true;
}
for (index = 0; index < toFind.length; ++index) {
 if (!(toFind[index] in dict)) {
  throw "Error";
 }
}

The Dictionary (ignore setup) test, where we don't worry about the setup time for the dictionary; note that this is different than the RegExp alternation (ignore setup) test because it assumes the overall list is invariant:

var index;
for (index = 0; index < toFind.length; ++index) {
 if (!(toFind[index] in theDictionary)) {
  throw "Error";
 }
}

The Array#indexOf test (note that some very old implementations of JavaScript may not have Array#indexOf):

var values = theString.split(" ");
var index;
for (index = 0; index < toFind.length; ++index) {
 if (values.indexOf(toFind[index]) < 0) {
  throw "Error";
 }
}

The Array#indexOf (ignore setup) test, which like Dictionary (ignore setup) assumes the overall list is invariant:

var index;
for (index = 0; index < toFind.length; ++index) {
 if (theArray.indexOf(toFind[index]) < 0) {
  throw "Error";
 }
}

Tags:

Javascript