geopandas spatial join extremely slow

adding the argument op='within' in the sjoin function speeds up the point-in-polygon operation dramatically.

Default value is op='intersects', which I guess would also lead to correct result, but is 100 to 1000 times slower.


What's likely going on here is that only the dataframe on the right is fed into the rtree index: https://github.com/geopandas/geopandas/blob/master/geopandas/tools/sjoin.py#L48-L55 Which for an op="intersects" run would mean the Polygon was fed into the index, so for every point, the corresponding polygon is found through the rtree index.

But for op="within", the geodataframes are flipped since the operation is actually the inverse of contains: https://github.com/geopandas/geopandas/blob/master/geopandas/tools/sjoin.py#L41-L43

So what happened when you switched the op from op="intersects" to op="within" is that for every polygon, the corresponding points are found through the rtree index, which in your case sped up the query.


The question asks how to take advantage of r-tree in geopandas spatial joins, and another responder correctly points out that you should use 'within' instead of 'intersects'. However, you can also take advantage of an r-tree spatial index in geopandas while using intersects/intersection, as demonstrated in this geopandas r-tree tutorial:

spatial_index = gdf.sindex
possible_matches_index = list(spatial_index.intersection(polygon.bounds))
possible_matches = gdf.iloc[possible_matches_index]
precise_matches = possible_matches[possible_matches.intersects(polygon)]