Identifying clusters in vector point data using QGIS?

I've combined bits from several suggestions and added a bit of my own and found a solution which works well for me - and all from within QGis!

I first ran a PostGis SELECT to find the points which have the right common attributes and lie within x km of each other:

SELECT DISTINCT s1.postcode,s1.the_geom, s1.gid FROM broadband_data AS s1 JOIN broadband_data AS s2 ON ST_DWithin(s1.the_geom, s2.the_geom,1000) WHERE s1.postcode != s2.postcode AND s1.fastest_broadband <= 2000

(Pretty much straight from Manning's very good PostGis in Action book, only adding a self-join)

I then loaded Carson Farmer's ManageR plugin, and imported the layer. From here I followed the suggested PAM clustering process here, and exported the result to a shape file, on which Convex Hulls were calculated in seconds using fTools (Carson does get around!).


Although not QGIS solution I'd personally opt for some exploratory analysis using SaTScan. It's fast, well documented and widely applied, so you shouldn't have troubles with starting up. 45k points might require some RAM though.

I'm not sure if it can read directly from Postgres but easily imports from dbf and text files.

The output of analysis can be then easily read back to Postgres or QGIS. You can decide to search for circular clusters or ellipses (might be useful to use if there is particular type of settlements in your data, for example long shaped cities/villages in valleys etc.). You can then generate polygons or ellipses or displays just the locations that are members of clusters.

For quick preview of the results in Google Earth you could also use NAACCR's SaTScan to Google Earth Conversion Tool.

Importantly - if you decide to run Monte Carlo simulations (99 minimum, I think) you will also be able to tell something about statistical significance of your clusters. Interpretation and justification of this clusters will be another issue as it has been debated in spatial sciences for last two decades at least (I think ;).

You could try to run purely spatial analysis looking for clusters of high, low or hagh & low values. If you have some temporal attributes in your data *daily, weekly aggregations) then I think it would be really interesting to run some space-time models.


SciPy has a clustering package (for python), you can use it in python console, write a simple plugin to do that or use PL/python inside postgis.

http://docs.scipy.org/doc/scipy/reference/cluster.html

After the analysis just use f-tools to create the convex hulls.