Find minimum distance between points of two lists in Python

The easiest way is probably to use scipy.spatial.distance.cdist:

import numpy as np
from scipy.spatial import distance

s1 = np.array([(0,0), (0,1), (1,0), (1,1)])
s2 = np.array([(3,2), (1,9)])
print(distance.cdist(s1,s2).min(axis=1))
# array([3.60555128, 3.16227766, 2.82842712, 2.23606798])

Some more speed might be gained by directly outputting 0 for any point from s1 that is also in s2.


To calculate the N distances, there's not a better method than brute forcing all of the possibilities. If you wanted something higher level, like perhaps the greatest or smallest distance, you could reduce the number of calculations based on some external knowledge, but the given your setup, the best you're going to get is O(n^2) performance.

EDIT: Given your comment, there are methods, that involve the general "divide and conquer" approach. Wikipedia has a good overview, and I'll copy a perhaps relevant bit here:

The problem can be solved in O(n log n) time using the recursive divide and conquer approach, e.g., as follows:

  1. Sort points according to their x-coordinates.
  2. Split the set of points into two equal-sized subsets by a vertical line x = xmid.
  3. Solve the problem recursively in the left and right subsets. This yields the left-side and right-side minimum distances dLmin and dRmin, respectively.
  4. Find the minimal distance dLRmin among the set of pairs of points in which one point lies on the left of the dividing vertical and the other point lies to the right.
  5. The final answer is the minimum among dLmin, dRmin, and dLRmin.

Brute force is really the main way. You might be able to get squeeze some performance out using a KDTree since your data is low dimensional. scipy.spatial.KDTree

kdtree = scipy.spatial.KDTree(s2)
neighbours = kdtree.query(s1)

Have you tried using cdist:

import numpy as np
from scipy.spatial.distance import cdist

np.min(cdist(s1,s2))

returns

array([ 3.60555128,  3.16227766,  2.82842712,  2.23606798])

You might also get a performance boost by replacing s1 and s2 by np.arrays, although scipy might be doing that internally, I'm not sure.

If this isn't optimised enough, I think you can do this in O(ns2*log(ns2) + ns1) by finding the Voronoi diagram of the points in s2 and then looping through s1 to see which region the point falls in which will match with the closest point in s2.