# Achieve performance that is competitive with numpy

Vectorization is one of the most effective ways to increase performance. The `numpy`

code you show is fast because it uses vectorization.

Vectorization means working with entire arrays instead of element by element: using array arithmetic, array comparisons, etc. Array operations can be implemented very efficiently using SIMD processing, and are also straightforwardly parallelizable. In fact Mathematica uses multiple CPU cores for vector arithmetic. This happens to be important for correct benchmarking as well because `Timing`

adds up the time spent by individual CPU cores while `AbsoluteTiming`

measures how much time has actually elapsed. Thus below I use only `AbsoluteTiming`

. `Timing`

would give *longer* and inaccurate times (try it!).

The first thing we can do is replace `Table[Random[], n]`

by `RandomReal[1, n]`

. In the spirit of vectorization, generate the whole array in one go instead of element by element:

```
RandomReal[1, 10000000]; // AbsoluteTiming
(* {0.083058, Null} *)
Table[Random[], 10000000]; // AbsoluteTiming
(* {0.312061, Null} *)
```

Then instead of `Select`

, use the techniques I described here:

- Does Mathematica have advanced indexing?

Mathematica does not have built-in vector comparison, like `samples < 0.5`

in `numpy`

. But it is always possible to express these operations in terms of simple arithmetic, as described in the thread linked above. Unfortunately, this often results in expressions which are hard to decipher. To make it easier to use these techniques, I wrote a small package called BoolEval that will translate expressions written in terms of relational operators (like `<`

, `>`

, `==`

, etc.) into vector arithmetic.

This is how we can apply the package to your example:

```
Select[Table[Random[], {10000000}], # < 0.5 &]; // AbsoluteTiming
(* {5.02988, Null} *)
```

```
<< BoolEval`
AbsoluteTiming[
arr = RandomReal[1, 10000000];
BoolPick[arr, arr < 0.5];
]
(* {0.310938, Null} *)
```

Behind the scenes, `BoolEval`

translates this into:

```
Pick[arr, 1 - UnitStep[arr - 0.5], 1]; // AbsoluteTiming
(* {0.237126, Null} *)
```

We can see this by using `BoolEval`

with a symbolic expression:

```
BoolEval[a < 0.5]
(* 1 - UnitStep[-0.5 + a] *)
```

The package also has other useful functions such as `BoolCount`

for counting how many elements satisfy the condition:

```
BoolCount[arr < 0.5] // AbsoluteTiming
(* {0.07805, 4998908} *)
```

This translates to `Total[1 - UnitStep[arr - 0.5]]`

and is precisely equivalent to the `numpy`

code you showed. As a comparison, your `numpy`

code runs in ~0.14 seconds on my machine, which is nearly twice slower. I was using an MKL-enabled numpy (from Anaconda), so the comparison with Mathematica is not unfair.

As noted by Szabolcs, there is an inefficiency in generating the table of real elements one at a time.

```
v = RandomReal[{0, 1}, 1000000];
Table[Timing[Length[Select[v, # < 0.5 &]]][[1]], {10}]
(* {0.384, 0.38, 0.376, 0.38, 0.38, 0.38, 0.504, 0.48, 0.384, 0.38} *)
```

However, a more important time saving can be made by compiling the selection function

```
sel = Compile[{{x, _Real, 1}}, Select[x, # < 0.5 &]];
Table[Timing[Length[sel[v]]][[1]], {10}]
(* {0.072, 0.068, 0.068, 0.068, 0.072, 0.068, 0.064, 0.068, 0.068, 0.068} *)
```