How to select the fastest approach for large numerical data computations?

The main question here is, there are too many approaches to perform the same operation. And normally, I didn't know which approach is the most optimal way in terms of efficiency.

Mathematica's performance is hard to predict, even more so than that of other high-level languages. There is no simple guideline you can follow. There will always be surprises and the behaviour will change from one version to the next.


Some insight into why Transpose is faster here:

On my machine (macOS / M12.1) Timing reports the lowest numbers for Part, not for Transpose. However, RepeatedTiming (which is based on AbsoluteTiming) reports a lower number for Transpose.

In[16]:= test1[[All, 1]]; // Timing
Out[16]= {1.32521, Null}

In[17]:= test1[[All, 1]]; // RepeatedTiming
Out[17]= {1.41, Null}

In[18]:= First[Transpose[test1]]; // Timing
Out[18]= {2.08334, Null}

In[19]:= First[Transpose[test1]]; // RepeatedTiming
Out[19]= {0.80, Null}

Typically, this is an indication that some operations are done in parallel. Timing measures the total time spent by each CPU core, while AbsoluteTiming measures wall time.

A quick look at the CPU monitor confirms that indeed, Part is single threaded (I see 100%) while Transpose is multi-threaded (I see ~250%).

This explains the difference.


This is another observation, that sometimes in Mathematica, combining 2 functions is faster than using 1 function.

Jon McLoone " 10 Tips for Writing Fast Mathematica Code " has proposed that "Using fewer function will speed up". But not all the case, I think.

Do a simple test: Using a function inside a Table to generate list.

In[11]:= a1 = Table[Power[i, 2], {i, 10^7}]; // AbsoluteTiming

Out[11]= {0.238681, Null}

Using Range first, and then put it in a functions .

In[12]:= a2 = Power[Range[10^7], 2]; // AbsoluteTiming

Out[12]= {0.0703124, Null}

Both are PackedArray.

In[16]:= Developer`PackedArrayQ /@ {a1, a2}

Out[16]= {True, True}

Maybe, Part, and Table are the big function? So they need to check something before doing the computational code? And Range, and Transpose is faster, because they are just doing one simple thing with less overhead?

Conclusions

  • Don't use Table[f,{i,iMax}]
  • But use f[Range[iMax]]

here is the performance proof:

testTable[n_] := AbsoluteTiming[Table[Power[i, 2], {i, 10^n}];]
testRange[n_] := AbsoluteTiming[Power[Range[10^n]];]

nList = {4, 5, 6, 7, 8};

t1 = First@testTable[#] & /@ nList;
t2 = First@testRange[#] & /@ nList;

ListLinePlot[{Transpose[{nList, t1}], Transpose[{nList, t2}]}, 
 PlotLegends -> {"Table", "Range"}, Mesh -> All]

enter image description here