Achieve performance that is competitive with numpy

Vectorization is one of the most effective ways to increase performance. The numpy code you show is fast because it uses vectorization.

Vectorization means working with entire arrays instead of element by element: using array arithmetic, array comparisons, etc. Array operations can be implemented very efficiently using SIMD processing, and are also straightforwardly parallelizable. In fact Mathematica uses multiple CPU cores for vector arithmetic. This happens to be important for correct benchmarking as well because Timing adds up the time spent by individual CPU cores while AbsoluteTiming measures how much time has actually elapsed. Thus below I use only AbsoluteTiming. Timing would give longer and inaccurate times (try it!).


The first thing we can do is replace Table[Random[], n] by RandomReal[1, n]. In the spirit of vectorization, generate the whole array in one go instead of element by element:

RandomReal[1, 10000000]; // AbsoluteTiming
(* {0.083058, Null} *)

Table[Random[], 10000000]; // AbsoluteTiming
(* {0.312061, Null} *)

Then instead of Select, use the techniques I described here:

  • Does Mathematica have advanced indexing?

Mathematica does not have built-in vector comparison, like samples < 0.5 in numpy. But it is always possible to express these operations in terms of simple arithmetic, as described in the thread linked above. Unfortunately, this often results in expressions which are hard to decipher. To make it easier to use these techniques, I wrote a small package called BoolEval that will translate expressions written in terms of relational operators (like <, >, ==, etc.) into vector arithmetic.

This is how we can apply the package to your example:

Select[Table[Random[], {10000000}], # < 0.5 &]; // AbsoluteTiming
(* {5.02988, Null} *)
<< BoolEval`

AbsoluteTiming[
 arr = RandomReal[1, 10000000];
 BoolPick[arr, arr < 0.5];
]
(* {0.310938, Null} *)

Behind the scenes, BoolEval translates this into:

Pick[arr, 1 - UnitStep[arr - 0.5], 1]; // AbsoluteTiming
(* {0.237126, Null} *)

We can see this by using BoolEval with a symbolic expression:

BoolEval[a < 0.5]
(* 1 - UnitStep[-0.5 + a] *)

The package also has other useful functions such as BoolCount for counting how many elements satisfy the condition:

BoolCount[arr < 0.5] // AbsoluteTiming
(* {0.07805, 4998908} *)

This translates to Total[1 - UnitStep[arr - 0.5]] and is precisely equivalent to the numpy code you showed. As a comparison, your numpy code runs in ~0.14 seconds on my machine, which is nearly twice slower. I was using an MKL-enabled numpy (from Anaconda), so the comparison with Mathematica is not unfair.


As noted by Szabolcs, there is an inefficiency in generating the table of real elements one at a time.

v = RandomReal[{0, 1}, 1000000];
Table[Timing[Length[Select[v, # < 0.5 &]]][[1]], {10}]
(* {0.384, 0.38, 0.376, 0.38, 0.38, 0.38, 0.504, 0.48, 0.384, 0.38} *)

However, a more important time saving can be made by compiling the selection function

sel = Compile[{{x, _Real, 1}}, Select[x, # < 0.5 &]];
Table[Timing[Length[sel[v]]][[1]], {10}]
(* {0.072, 0.068, 0.068, 0.068, 0.072, 0.068, 0.064, 0.068, 0.068, 0.068} *)