how to find 20 closest points efficiently

With PostGIS 2.0 on PostgreSQL 9.1, you can use the KNN indexed nearest neighbour operator, e.g.:

SELECT *, geom <-> ST_MakePoint(-90, 40) AS distance
FROM table
ORDER BY geom <-> ST_MakePoint(-90, 40)
LIMIT 20 OFFSET 0;

The above should query within a few milliseconds.

For the next multiples of 20, modify to OFFSET 20, OFFSET 40, etc ...


If all you are looking for are proximity point searches (nearest neighbour queries), then you don't want to use the old ST_DWithin or ST_Distance + ORDER BYs for that.

Not anymore.

Now that PostGIS 2.0 shipped, you should be using the knngist index support (a native PostgreSQL feature). It will be orders of magnitude faster.

An excerpt from this blog entry that describes how to use knn gist without PostGIS:

$ create table test ( position point );

CREATE TABLE
Table created. Now let’s insert some random points:
$ insert into test (position) select point( random() * 1000, random() * 1000) from generate_series(1,1000000);

INSERT 0 1000000
1 million points should be enough for my example. All of them have both X and Y in range <0, 1000). Now we just need the index:
$ create index q on test using gist ( position );

CREATE INDEX
And we can find some rows close to center of the points cloud:
$ select *, position <-> point(500,500) from test order by position <-> point(500,500) limit 10;

              position               |     ?column?

-------------------------------------+-------------------

 (499.965638387948,499.452529009432) | 0.548548271254899

 (500.473062973469,500.450353138149) |  0.65315122744144

 (500.277776736766,500.743471086025) | 0.793668174518778

 (499.986605718732,500.844359863549) | 0.844466095200968

 (500.858531333506,500.130807515234) | 0.868439207229501

 (500.96702715382,499.853323679417)  | 0.978087654172406

 (500.975443981588,500.170825514942) | 0.990289007195055

 (499.201623722911,499.368405900896) |  1.01799596553335

 (498.899147845805,500.683960970491) |  1.29602394829404

 (498.38217580691,499.178630765527)  |  1.81438764851559

(10 rows)
And how about speed?
$ explain analyze select *, position <-> point(500,500) from test order by position <-> point(500,500) limit 10;

                                                        QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------

 Limit  (cost=0.00..0.77 rows=10 width=16) (actual time=0.164..0.475 rows=10 loops=1)

   ->  Index Scan using q on test  (cost=0.00..76512.60 rows=1000000 width=16) (actual time=0.163..0.473 rows=10 loops=1)

         Order By: ("position" <-> '(500,500)'::point)

 Total runtime: 0.505 ms

(4 rows)

Interesting enough, the index traversal will return the features in order of proximity, so no need to do a sort (i.e order by) for the results!

However, if you want to use it alongside PostGIS, now it is really easy. Just follow these instructions.

The relevant part is this:

SELECT name, gid
FROM geonames
ORDER BY geom <-> st_setsrid(st_makepoint(-90,40),4326)
LIMIT 10;

But don't take my word of it. Time it yourself :)


Spatial queries are definitely the thing to use.

With PostGIS I would first try something simplistic like this and tweak the range as needed:

SELECT * 
FROM table AS a
WHERE ST_DWithin (mylocation, a.LatLong, 10000) -- 10km
ORDER BY ST_Distance (mylocation, a.LatLong)
LIMIT 20

This would compare points (actually their bounding boxes) using the spatial index, so it should be fast. Another approach that comes to mind is buffering your location and then intersecting that buffer with the original data, which may be even more efficient.