How can I speed up nearest neighbor search with python?

You can switch to approximate nearest neighbors (ANN) algorithms which usually take advantage of sophisticated hashing or proximity graph techniques to index your data quickly and perform faster queries. One example is Spotify's Annoy. Annoy's README includes a plot which shows precision-performance tradeoff comparison of various ANN algorithms published in recent years. The top-performing algorithm (at the time this comment was posted), hnsw, has a Python implementation under Non-Metric Space Library (NMSLIB).

It would be interesting to try sklearn.neighbors.NearestNeighbors, which offers n_jobs parameter:

The number of parallel jobs to run for neighbors search.

This package also provides the Ball Tree algorithm, which you can test versus the kd-tree one, however my hunch is that the kd-tree will be better (but that again does depend on your data, so research that!).

You might also want to use dimensionality reduction, which is easy. The idea is that you reduce your dimensions, thus your data contain less info, so that tackling the Nearest Neighbour Problem can be done much faster. Of course, there is a trade off here, accuracy!

You might/will get less accuracy with dimensionality reduction, but it might worth the try. However, this usually applies in a high dimensional space, and you are just in 3D. So I don't know if for your specific case it would make sense to use sklearn.decomposition.PCA.

A remark:

If you really want high performance though, you won't get it with python, you could switch to c++, and use CGAL for example.

How can I speed up nearest neighbor search with python?

Tags:

Python

Performance

Parallel Processing

Kdtree

Nearest Neighbor

Related

Recent Posts