Faster "Closest Pair of Points Problem" implementation?

Here's a refinement of @Henrik's approach. The key difference is that using Nearest[pts->"Distance"] is over an order of magnitude faster than using Nearest[pts->{"Index", "Distance"}]:

SeedRandom[1];
pts = RandomReal[1, {10000,2}];

AbsoluteTiming[
    data=Nearest[pts->"Distance"][pts,2][[All,2]];
    d=Min[data];
    i=First @ Ordering[data,1];
    j=Nearest[pts->"Index"][pts[[i]],2][[2]];
    {d, pts[[i]],pts[[j]]}
]

{0.008782, {0.000108351, {0.644732, 0.0760254}, {0.64463, 0.0759882}}}

Compared to @Henrik's answer:

AbsoluteTiming[
    data=Nearest[pts->{"Index","Distance"}][pts,2][[All,2]];
    i=OrderingBy[data,Last,1][[1]];
    j=data[[i,1]];
    dist=data[[i,2]];
    result={dist,pts[[i]],pts[[j]]}
]

{0.149502, {0.000108351, {0.644732, 0.0760254}, {0.64463, 0.0759882}}}


Maybe this works for you:

AbsoluteTiming[
 data = Nearest[pts -> {"Index", "Distance"}][pts, 2][[All, 2]];
 i = OrderingBy[data, Last, 1][[1]];
 j = data[[i, 1]];
 dist = data[[i, 2]];
 result = {dist, pts[[i]], pts[[j]]}
 ]

i and j are the indices of the pair of points that are closest to each other; dist is their distance.

Nearest[pts -> {"Index", "Distance"}][pts, k] finds the for every point its k nearest neighbors. The first "neighbor" will always be the point itself. That's why I take Nearest[pts -> {"Index", "Distance"}][pts, 1][[All, 2]] to get the nearest "true" neighbor.

I am pretty sure that this does quite the same like C.E.'s NearestNeighborGraph implementation, but it is about 3 times faster on my machine and alsmost twice as fast as b3m2a1's compiled implementation (on my machine and for 10000 points). This is only faster than NearestNeighborGraph because it does not rely on Mathematica's Graph implementation and all its overhead. Yet another example where avoiding Graph boosts performance. But I have to say that NearestNeighborGraph catches up a bit when one increases the number of points.

The performance degradations start with Graph using a fancy interface that enforces conversion of packed lists of edge indices to unpacked lists of UndirectedEdges and DirectedEdges. And it goes on with all the fancy vertex and edge attributes. Once I had to dig through the half-hidden implementation details of some Graph feature for debugging. I got cold shivers seeing that one had to pass a dozen of parser layers before one arrives at the actual algorithms...


Straight-forward application of Compile to your code. No effort has been taken towards optimization. You'll need to evaluate this twice for Compile to pick up on the recursion. I flattened the return structure and made it explicitly 2D, but you can easily make it higher-dimensional if you want to by turning the 2;;5 into some calculated parameter.

simpleMin =
  With[
   {
    EuclideanDistance = (Sqrt@Total[(# - #2)^2] &)
    },
   Compile[
    {
     {n, _Integer}, 
     {xP, _Real, 2}
     },
    Module[{d1, d2, d3},
     Which[
      n == 2,
      Join[{EuclideanDistance[xP[[1]], xP[[2]]]}, xP[[1]], xP[[2]]],
      n == 3,
      d1 = EuclideanDistance[xP[[1]], xP[[2]]];
      d2 = EuclideanDistance[xP[[1]], xP[[3]]];
      d3 = EuclideanDistance[xP[[3]], xP[[2]]];
      If[d1 <= d2 && d1 <= d3,
       Join[{EuclideanDistance[xP[[1]], xP[[2]]]}, xP[[1]], xP[[2]]],
       If[d2 <= d1 && d2 <= d3,
        Join[{EuclideanDistance[xP[[1]], xP[[3]]]}, xP[[1]], xP[[3]]],
        Join[{EuclideanDistance[xP[[2]], xP[[3]]]}, xP[[2]], xP[[3]]]
        ]
       ],
      True,
      {-1., -1., -1., -1., -1.}
      ]
     ]
    ]
   ];
closestPairRC =
  With[
   {
    EuclideanDistance = (Sqrt@Total[(# - #2)^2] &),
    simpleMin = simpleMin
    },
   Compile[
    {
     {xP, _Real, 2},
     {yP, _Real, 2}
     },
    Module[{
      n, mid,
      xL, xR, xm, yL, yR,
      dL, pairL, dmin, pairMin,
      yS, nS, closest, closestP,
      k, cDist
      },
     (*where xP is P(1).. P(n) sorted by x coordinate,
     and yP is P(1).. P(n) sorted by y coordinate (ascending order)*)
     
     n = Length[xP];
     dL = -1.;
     pairL = {-1., -1., -1., -1., -1.};
     dmin = -1.;
     pairMin = {-1., -1., -1., -1., -1.};
     If[n <= 3,
      simpleMin[n, xP],
      (* hard to make the recursion work well inside `Compile`. 
      Might be a case for `FunctionCompile` *)
      mid = n;
      mid = Ceiling[n/2];
      xL = xP[[1 ;; mid]];
      xR = xP[[mid + 1 ;; n]];
      xm = xP[[mid]];
      yL = Select[yP, #[[1]] <= xm[[1]] &];
      yR = Select[yP, #[[1]] > xm[[1]] &];
      pairL = closestPairRC[xL, yL];
      dL = pairL[[1]];
      pairL = pairL[[2 ;; 5]];
      pairMin = closestPairRC[xR, yR];
      dmin = pairMin[[1]];
      pairMin = pairMin[[2 ;; 5]];
      If[dL < 0 || dL < dmin, dmin = dL; pairMin = pairL;];
      yS = Select[yP, Abs[#[[1]] - xm[[1]]] <= dmin &];
      nS = Length[yS];
      closest = dmin;
      closestP = pairMin;
      Do[
       k = i + 1;
       While[
        (k <= nS) && (yS[[k, 2]] - yS[[i, 2]] < dmin), 
        cDist = EuclideanDistance[yS[[k]], yS[[i]]];
        If[cDist < closest,
         closest = cDist; closestP = Join[yS[[k]], yS[[i]]]
         ];
        k = k + 1
        ],
       {i, 1, nS - 1}
       ];
      Join[{closest}, closestP]
      ](*end if*)
     ]
    ]
   ];

Corresponding uncompiled version

closestPairR =
  With[
   {
    EuclideanDistance = (Sqrt@Total[(# - #2)^2] &),
    simpleMin = simpleMin
    },
   Function[
    {xP, yP},
    Module[{
      n, mid,
      xL, xR, xm, yL, yR,
      dL, pairL, dmin, pairMin,
      yS, nS, closest, closestP,
      k, cDist
      },
     (*where xP is P(1).. P(n) sorted by x coordinate,
     and yP is P(1).. P(n) sorted by y coordinate (ascending order)*)
     
     n = Length[xP];
     dL = -1.;
     pairL = {-1., -1., -1., -1., -1.};
     dmin = -1.;
     pairMin = {-1., -1., -1., -1., -1.};
     If[n <= 3,
      simpleMin[n, xP],
      (* hard to make the recursion work well inside `Compile`. 
      Might be a case for `FunctionCompile` *)
      mid = n;
      mid = Ceiling[n/2];
      xL = xP[[1 ;; mid]];
      xR = xP[[mid + 1 ;; n]];
      xm = xP[[mid]];
      yL = Select[yP, #[[1]] <= xm[[1]] &];
      yR = Select[yP, #[[1]] > xm[[1]] &];
      pairL = closestPairR[xL, yL];
      dL = pairL[[1]];
      pairL = pairL[[2 ;; 5]];
      pairMin = closestPairR[xR, yR];
      dmin = pairMin[[1]];
      pairMin = pairMin[[2 ;; 5]];
      If[dL < 0 || dL < dmin, dmin = dL; pairMin = pairL;];
      yS = Select[yP, Abs[#[[1]] - xm[[1]]] <= dmin &];
      nS = Length[yS];
      closest = dmin;
      closestP = pairMin;
      Do[
       k = i + 1;
       While[
        (k <= nS) && (yS[[k, 2]] - yS[[i, 2]] < dmin), 
        cDist = EuclideanDistance[yS[[k]], yS[[i]]];
        If[cDist < closest,
         closest = cDist; closestP = Join[yS[[k]], yS[[i]]]
         ];
        k = k + 1
        ],
       {i, 1, nS - 1}
       ];
      Join[{closest}, closestP]
      ](*end if*)
     ]
    ]
   ];

I'm dropping the wrapper so we can directly compare the two

pts = BlockRandom[N@RandomReal[1, {10000, 2}]];
sortPts = {Sort[pts], SortBy[pts, Last]};
RepeatedTiming[closestPairRC @@ sortPts]
RepeatedTiming[closestPairR @@ sortPts]

{0.0898, {0.0000934546, 0.41575, 0.168734, 0.415831, 0.168781}}

{0.830, {0.0000934546, 0.41575, 0.168734, 0.415831, 0.168781}}

So it's like a factor of 10 faster. You might be able to get better performance with FunctionCompile since I know it's able to deal with recursion cleanly. You can also play with compilation options or any number of other things.