Dramatic speed difference of code on MATLAB and Mathematica

Modify the calculation order a little to avoid ragged array and then make use of Listable and Compile:

computeDistance[pos_] := DistanceMatrix[pos, DistanceFunction -> EuclideanDistance]
liuQuartic = {r, h} \[Function] 
   15/(7 Pi*h^2) (2/3 - (9 r^2)/(8 h^2) + (19 r^3)/(24 h^3) - (5 r^4)/(32 h^4));
initializeDensity = 
  With[{l = liuQuartic, m = uniMass}, 
   Compile[{{d, _Real, 2}, {h, _Real}}, m Total@Transpose[l[d, h] UnitStep[2 h - d]]]];
new = initializeDensity[computeDistance[N@totalPos], h]; // AbsoluteTiming

Tested with your new added sample data, my code ran for 0.390000 s while the original code ran for 4.851600 s and ybeltukov's code ran for 0.813200 s on my machine.

If you have a C compiler installed, the following code

computeDistance[pos_] := DistanceMatrix[pos, DistanceFunction -> EuclideanDistance]
liuQuartic = {r, h} \[Function] 
   15/(7 Pi*h^2) (2/3 - (9 r^2)/(8 h^2) + (19 r^3)/(24 h^3) - (5 r^4)/(32 h^4));
initializeDensity = 
  With[{l = liuQuartic, m = uniMass, g = Compile`GetElement}, 
   Compile[{{d, _Real, 2}, {h, _Real}}, 
    Module[{b1, b2}, {b1, b2} = Dimensions@d; 
     m Table[Sum[If[2 h > g[d, i, j], l[g[d, i, j], h], 0.], {j, b2}], {i, b1}]], 
      CompilationTarget -> "C", RuntimeOptions -> "Speed"]];

will give you a 2X speedup once again. Notice the C compiler is necessary, see this post for some more details.


You miss that many Mathematica functions are Listable. It allows you to write a fast and clear code

init2[distance_] := uniMass Total[liuQuartic[distance, h] UnitStep[2 h - distance], {2}]

h = 0.1;
uniMass = 1.0;
liuQuartic[d_, h_] := d^2 - h^2;
totalPos = RandomReal[1, {1119, 2}];
res1 = initializeDensity@computeDistance[totalPos]; // AbsoluteTiming
res2 = init2@computeDistance[totalPos]; // AbsoluteTiming
res1 == res2
(* {3.088372, Null} *)
(* {0.130059, Null} *)
(* True *)

It seems that this code is faster than MATLAB.


Here a slightly improved version of xzczd's code (I call the function cinitializeDensity in the following) that does not require computing the DistanceMatrix beforehand. Moreover, I tried to suppress some type casts within the CompiledFunction and to exploit parallelization.

Block[{r, h},
  cinitializeDensity2 = 
   With[{code = N[liuQuartic[Sqrt[r], h]], m = N[uniMass], g = Compile`GetElement}, 
    Compile[{{x, _Real, 1}, {y, _Real, 2}, {h, _Real}},
     Block[{r, sum = 0., x1, x2},
      x1 = g[x, 1];
      x2 = g[x, 2];
      Do[
       r = (x1 - g[y, j, 1])^2 + (x2 - g[y, j, 2])^2;
       sum += If[r < 4. h^2, code, 0.],
       {j, 1, Length[y]}];
      m sum
      ],
     CompilationTarget -> "C",
     Parallelization -> True,
     RuntimeOptions -> "Speed",
     RuntimeAttributes -> {Listable}
     ]]
  ];

Along with packing, this leads to further speedup:

ptotalPos = Developer`ToPackedArray[N[totalPos]];
a = initializeDensity[computeDistance[ptotalPos]]; // AbsoluteTiming // First
b = cinitializeDensity[computeDistance[ptotalPos], h]; // AbsoluteTiming // First
c = cinitializeDensity2[ptotalPos, ptotalPos, h]; // AbsoluteTiming // First
a == b == c

1.28708

0.006039

0.000844

True

For even longer list, it might be worthwhile to delegate the distance checks to Nearest:

Block[{r, h},
  cinitializeDensity3 = 
   With[{code = N[liuQuartic[Sqrt[r], h]], m = N[uniMass], 
     g = Compile`GetElement}, 
    Compile[{{x, _Real, 1}, {y, _Real, 2}, {h, _Real}},
     Block[{r, sum = 0., x1, x2},
      x1 = g[x, 1];
      x2 = g[x, 2];
      Do[
       r = (x1 - g[y, j, 1])^2 + (x2 - g[y, j, 2])^2;
       sum += code,
       {j, 1, Length[y]}];
      m sum
      ],
     CompilationTarget -> "C",
     Parallelization -> True,
     RuntimeOptions -> "Speed",
     RuntimeAttributes -> {Listable}
     ]]
  ];

d = cinitializeDensity3[ptotalPos, Nearest[ptotalPos, ptotalPos, {∞, 2 h}], h]; // AbsoluteTiming // First
a == d

0.00126

True