Snap points to the nearest location on a polygon

I publish my answer as a workaround to get the expected result for your situation, in the hope that someone will clarify the behavior of ST_ClosestPoint.

So run the script in the form of CTE:

CREATE TABLE polys_pts AS
WITH  
polys(poly_id, geom) AS (VALUES  (1, 'POLYGON((1 1, 1 5, 4 5, 4 4, 2 4, 2 2, 4 2, 4 1, 1 1))'::GEOMETRY),
                (2, 'POLYGON((6 6, 6 10, 8 10, 9 7, 8 6, 6 6))'::GEOMETRY)),
pnt_clusters AS (SELECT  polys.poly_id,
      CASE
          WHEN ST_Area(polys.geom)>9 THEN ST_ClusterKMeans(pts.geom, 8) OVER(PARTITION BY polys.poly_id) 
          ELSE ST_ClusterKMeans(pts.geom, 2) OVER(PARTITION BY polys.poly_id) 
      END AS cluster_id, pts.geom FROM polys,
          LATERAL ST_Dump(ST_GeneratePoints(polys.geom, 1000, 1)) AS pts),
centroids AS (SELECT cluster_id, ST_PointOnSurface(ST_collect(geom)) AS geom FROM pnt_clusters GROUP BY poly_id, cluster_id),
neg_buffer AS (SELECT poly_id, ST_ExteriorRing(ST_Buffer(geom, -0.4, 'endcap=flat join=round')) geom FROM polys GROUP BY poly_id, polys.geom),
pnt_clusters_new AS (SELECT DISTINCT ST_ClosestPoint(a.geom, b.geom) AS geom FROM neg_buffer a, centroids b),
node_pts AS (SELECT ST_StartPoint(geom) geom FROM neg_buffer),
snap_pts AS (SELECT b.cluster_id, a.geom FROM pnt_clusters_new a JOIN centroids b ON ST_DWithin(a.geom, b.geom, 0.4))
SELECT  a.cluster_id, (a.geom) geom FROM snap_pts a WHERE NOT EXISTS (SELECT 1 FROM node_pts b WHERE ST_Intersects(a.geom, b.geom))

And check the result.


I hope I have understood your qualifying condition correctly and as a result run the specified script:

CREATE TABLE polys_pts AS
WITH  
polys(poly_id, geom) AS (VALUES  (1, 'POLYGON((1 1, 1 5, 4 5, 4 4, 2 4, 2 2, 4 2, 4 1, 1 1))'::GEOMETRY),
                (2, 'POLYGON((6 6, 6 10, 8 10, 9 7, 8 6, 6 6))'::GEOMETRY)),
pnt_clusters AS (SELECT  polys.poly_id,
      CASE
          WHEN ST_Area(polys.geom)>9 THEN ST_ClusterKMeans(pts.geom, 8) OVER(PARTITION BY polys.poly_id) 
          ELSE ST_ClusterKMeans(pts.geom, 2) OVER(PARTITION BY polys.poly_id) 
      END AS cluster_id, pts.geom FROM polys,
          LATERAL ST_Dump(ST_GeneratePoints(polys.geom, 1000, 1)) AS pts),
centroids AS (SELECT cluster_id, ST_PointOnSurface(ST_collect(geom)) AS geom FROM pnt_clusters GROUP BY poly_id, cluster_id),
neg_buffer AS (SELECT poly_id, (ST_Buffer(geom, -0.4, 'endcap=flat join=round')) geom FROM polys GROUP BY poly_id, polys.geom),
neg_buffer_pts_out AS (SELECT a.cluster_id, (a.geom) geom FROM centroids a WHERE NOT EXISTS (SELECT 1 FROM neg_buffer b WHERE ST_Intersects(a.geom, b.geom))),
neg_buffer_pts_in AS (SELECT a.cluster_id, (a.geom) geom FROM centroids a WHERE EXISTS (SELECT 1 FROM neg_buffer b WHERE ST_Intersects(a.geom, b.geom))),
snap_pts_clusters_in AS (SELECT DISTINCT ST_ClosestPoint(ST_ExteriorRing(a.geom), b.geom) AS geom FROM neg_buffer a, neg_buffer_pts_in b),
node_pts AS (SELECT ST_StartPoint(ST_ExteriorRing(geom)) geom FROM neg_buffer),
snap_pts AS (SELECT b.cluster_id, a.geom FROM snap_pts_clusters_in a JOIN centroids b ON ST_DWithin(a.geom, b.geom, 0.4))
SELECT  a.cluster_id, (a.geom) geom FROM snap_pts a WHERE NOT EXISTS (SELECT 1 FROM node_pts b WHERE ST_Intersects(a.geom, b.geom))
UNION SELECT c.cluster_id, (c.geom) geom FROM neg_buffer_pts_out c ORDER BY cluster_id;

Check the result and refine its behavior if necessary...

As you can see from the script, I added a part of the code that separated the points by the hit and miss condition (buffer zone), and the logic remained the same.

However, as a result, my new question appeared about the strange behavior of the ST_ClosestPoint() function.


There will be bias, no matter what you do.

ST_PointOnSurface operates with a deterministic algorithm based on the areal shape; in combination with your cluster based centroids, this has the effect that, when the area of the ST_Area(ST_Buffer(geom, -0.4)) is less than 50% of the ST_Area(geom), you may get POS outside of that threshold. However, since the general extent and shape of the areas that make up each cluster stays similar despite scaling the initial polygons, the general distribution of centroid would also stay similar, even if you scale input area for ST_GeneratePoints. For centroids that would fall outside of the actual area (like with the C shape), the POS would be placed on the boundary if the boundary is more than half way away from the centroid.

In other words, you should get equally uniformly distributed centroids, always within the threshold distance, when creating random points in a negatively buffered polygon only if the buffered area would be less than half the size of the actual area, and else use the actual area:

SELECT  poly_id,
        cluster_id,
        ST_PointOnSurface(ST_Collect(geom)) AS geom
FROM    (
SELECT  polys.poly_id,
        CASE WHEN ST_Area(polys.geom) > 9
            THEN ST_ClusterKMeans(pts.geom, 8) OVER(PARTITION BY polys.poly_id) 
            ELSE ST_ClusterKMeans(pts.geom, 2) OVER(PARTITION BY polys.poly_id) 
        END AS cluster_id,
        pts.geom
FROM    polys,
        LATERAL ST_Dump(ST_GeneratePoints(CASE ST_Area(ST_Buffer(polys.geom, -0.4)) / ST_Area(polys.geom) > 0.5 WHEN TRUE THEN geom ELSE ST_Buffer(polys.geom, -0.4) END, 1000, 1)) AS pts
) q
GROUP BY
        1, 2
;

enter image description here

[ lighter gray: initial shapes; darker gray: negative buffer (-0.4); red: no buffer; yellow: ratio dependent buffer; orange: points fall on the same spot]