What's fastest way of defining 10^5 down values?

Summary: undocumented HashTable is a bit faster (at least in version 9) both in storage and in retrieval than DownValues.

DownValues

list = RandomInteger[{-10^9, 10^9}, 10^6];
ret  = RandomInteger[{-10^9, 10^9}, 10^6];

isGood[___] = False;
Scan[(isGood[#] = True) &, list]; // AbsoluteTiming

(* ==> {3.240005, Null} *)

ClearAll[isGood];
isGood[___] = False;
Do[isGood@i = True, {i, list}]; // AbsoluteTiming

(* ==> {2.350003, Null} *)

On my computer, this takes less then 3 seconds for a million integers if Do is used instead of Scan. Isn't this fast enough?

Retrieval of the results is also quite quick, and is almost independent whether Table or Map is used:

isGood /@ ret; // AbsoluteTiming

(* ==> {1.410002, Null} *)

Table[isGood@i, {i, ret}]; // AbsoluteTiming

(* ==> {1.450002, Null} *)

HashTable

Out of curiosity, I compared this to the undocumented HashTable (mentioned here) and got even better results. Note, that the hash table must be checked for existing value (as list might contain duplicates) otherwise HashTableAdd returns with error. Or it is even better to prefilter list by removing duplicates, but that is omitted here not to bias the comparison.

hash = System`Utilities`HashTable[];
Do[If[
    Not@System`Utilities`HashTableContainsQ[hash, i], 
    System`Utilities`HashTableAdd[hash, i, True] (* last argument can be omitted *)
    ], {i, list}]; // AbsoluteTiming

(* ==> {2.010003, Null} *)

System`Utilities`HashTableContainsQ[hash, #] & /@ ret; // AbsoluteTiming

(* ==> {1.340002, Null} *)

Table[System`Utilities`HashTableContainsQ[hash, i], {i, ret}]; // AbsoluteTiming

(* ==> {1.050001, Null} *)

We see that both storage and retrieval are a bit faster.


Are you sure you want to use UpValues? You can use Dispatch which is pretty fast when generating the lookup table and is equally fast when accessing values:

n = 6;
list = RandomInteger[{0, 10^(n + 1)}, {10^n}];

AbsoluteTiming[disp = Dispatch@Thread[list -> True];]
{1.6220927, Null}
Remove[isGood];
AbsoluteTiming[isGood[___] = False; Scan[(isGood[#] = True) &, list]]
{3.5982058, Null}

Query values:

test = RandomInteger[{0, 10^(n + 1)}, {10^n}];

AbsoluteTiming[Count[test /. disp, True]]
{1.9151096, 94844}
AbsoluteTiming[Count[isGood /@ test, True]]
{1.7601007, 94844}

This seems to be faster to define downvalues

list = RandomInteger[{-100000000, 100000000}, 1000000];
DownValues[isGood] = 
   HoldPattern[isGood[#]] :> True & /@ list; // AbsoluteTiming
isGood[___] = False;