Understanding use of spatial indexes with RTree?

That's the gist of it. The R-tree allows you to make a very fast first pass and gives you a set of results that will have "false positives" (bounding boxes may intersect when the geometries precisely do not). Then you go over the set of candidates (fetching them from the shapefile by their index) and do a mathematically precise intersection test using, e.g., Shapely. This is the very same strategy that's employed in spatial databases like PostGIS.


You've almost got it, but you've made a small error. You need to use the intersection method on the spatial index, rather than passing the index to the intersection method on the buffered point. Once you've found a list of features where the bounding boxes overlap, then you need to check if your buffered point actually intersects the geometries.

import fiona
from shapely.geometry import mapping
import rtree
import math

areaM2 = areaKM2 * 1000000
r = (math.sqrt(areaM2/math.pi))

# open both layers
with fiona.open('single_pt_speed_test.shp', 'r') as layer_pnt:
    with fiona.open('class3_aa.shp', 'r') as layer_land:

        # create an empty spatial index object
        index = rtree.index.Index()

        # populate the spatial index
        for fid, feature in layer_land.items():
            geometry = shape(feature['geometry'])
            idx.insert(fid, geometry.bounds)

        for feature in layer_pnt:
            # buffer the point
            geometry = shape(feature['geometry'])
            geometry_buffered = geometry.buffer(r)

            # get list of fids where bounding boxes intersect
            fids = [int(i) for i in index.intersection(geometry_buffered.bounds)]

            # access the features that those fids reference
            for fid in fids:
                feature_land = layer_land[fid]
                geometry_land = shape(feature_land['geometry'])

                # check the geometries intersect, not just their bboxs
                if geometry.intersects(geometry_land):
                    print('Found an intersection!')  # do something useful here

If you're interested in finding points that are within a minimum distance to your land class, you could use the distance method instead (swap out the appropriate section from previous).

for feature in layer_pnt:
    geometry = shape(feature['geometry'])

    # expand bounds by r in all directions
    bounds = [a+b*r for a,b in zip(geometry.bounds, [-1, -1, 1, 1])]

    # get list of fids where bounding boxes intersect
    fids = [int(i) for i in index.intersection(geometry_buffered.bounds)]

    for fid in fids:
        feature_land = layer_land[fid]
        geometry_land = shape(feature_land['geometry'])

        # check the geometries are within r metres
        if geometry.distance(geometry_land) <= r:
            print('Found a match!')

If it's taking a long time to build your spatial index and you're going to do this more than a few times, you should look into serialising the index to a file. The documentation describes how to do this: http://toblerity.org/rtree/tutorial.html#serializing-your-index-to-a-file

You could also look at bulk-loading the bounding boxes into the rtree using a generator, like this:

def gen(collection):
    for fid, feature in collection.items():
        geometry = shape(feature['geometry'])
        yield((fid, geometry.bounds, None))
index = rtree.index.Index(gen(layer_land))

Yes that is the idea. Here is an excerpt from this tutorial on using an r-tree spatial index in Python, using shapely, Fiona, and geopandas:

An r-tree represents individual objects and their bounding boxes (the “r” is for “rectangle”) as the lowest level of the spatial index. It then aggregates nearby objects and represents them with their aggregate bounding box in the next higher level of the index. At yet higher levels, the r-tree aggregates bounding boxes and represents them by their bounding box, iteratively, until everything is nested into one top-level bounding box. To search, the r-tree takes a query box and, starting at the top level, sees which (if any) bounding boxes intersect it. It then expands each intersecting bounding box and sees which of the child bounding boxes inside it intersect the query box. This proceeds recursively until all intersecting boxes are searched down to the lowest level, and returns the matching objects from the lowest level.