Fixing orphaned holes in R

I've analysed the geometry issues in the attached data, and it seems it does not ONLY have orphaned holes but also geometry validity issues. It's true that an orphaned hole is somehow a geometry validity issue, but rgeos does not handle it in the same way, as for orphaned holes, an error is raised, instead of a simple warning. As you indicate, they are hints to check polygon holes, but it is not always successfull when applied in order to fix orphaned holes.

So, let's:

  1. clean your data (which is required if you wish to do geoprocessing like union)

  2. use the cleaned data with your union process

1. Cleaning geometry Fixing geometries in R can be sometimes challenging, so i've tried to built an experimental R package (see that intends to facilitate cleaning of sp objects (at now limited on polygonal shapes). You can install the package with:


To start, it's good that you see what are the geometry issues with your source data. For this, you can run the following (your data is large so it can take some time):

#get a report of geometry validity & issues for a sp spatial object
report <- clgeo_CollectionReport(sp)
summary <- clgeo_SummaryReport(report)
issues <- report[report$valid == FALSE,]

With this, you will see that your data has 2 kinds of issues: orphaned holes and geometry validity issues. Both (and not only the orphaned holes) are likely to make the union process failing, so the data should be cleaned before, in an automate way when possible. For a fast reproduction, the first sample code below only takes the subset of features that are tagged as suspicious (except the latest one, with index = 9002 in the original data - see my note below on this)

#get suspicious features (indexes)
nv <- clgeo_SuspiciousFeatures(report)
mysp <- sp[nv[-14],]

#try to clean data
mysp.clean <- clgeo_Clean(mysp, print.log = TRUE)

#check if they are still errors
report.clean <- clgeo_CollectionReport(mysp.clean)
summary.clean <- clgeo_SummaryReport(report.clean)

If clgeo_Clean does well the job, you should get all geometries valid now. You can apply this to the complete dataset (except feature index = 9002)

#try to clean data
mysp <- sp[-9002,]
mysp.clean <- clgeo_Clean(mysp, print.log = TRUE)

#check if they are still errors
report.clean <- clgeo_CollectionReport(mysp.clean)
summary.clean <- clgeo_SummaryReport(report.clean)

2. Union process Now, let's see if the union works on this dataset:

#Attempting a UnionSpatialPolygons based on the COUNTY field
mysp.df <- as(mysp, "data.frame")
countycol <- mysp.df$COUNTY
mysp.diss <- unionSpatialPolygons(mysp.clean, countycol)

Note: as said before, i've remove one feature (index = 9002).By plotting it: plot(sp[9002,]), you will see that this feature is very (very) complex. I've excluded it from the sample only because checking holes was taking too much time. Let's see now if the same problem occurs using readShapePoly (from maptools) for reading the data...

3. Switch to readShapePoly vs. readOGR for reading data (UPDATE)

readOGR is not the only function available to read shapefiles. You can also use readShapePoly from maptools package, generally more performant than the first one:

mysp <- readShapePoly("ReproducibleExample.shp")

Apart from running faster:

  • if you use the above code based on clgeo_CollectionReport, there is no problem of orphaned holes, but still problems of geometry.

  • Cleaning the geometry with clgeo_Clean also runs well, and now it doesn't get stuck with the feature index 9002

  • And... the union process works.

See below the plot result:

#plot the result
plot(mysp, border= "lightgray")
plot(mysp.diss, border="red", add = TRUE)

Union result

Conclusion: prefer maptools to read your shapefile data, and consider using cleangeo to clean your data before any geoprocessing.

A convenient solution that keeps working for me in R is to apply a zero-width buffer:

#loading required packages

#load example data, set "dsn=" to your working directory or specify the path
example <- readOGR(dsn=".",layer="ReproducibleExample")

#project your data (I'm using California Albers here) and apply a zero-width buffer
example <- spTransform(example, CRS("+init=epsg:3310"))
example <- gBuffer(example, byid = T, width = 0)

#Attempting a UnionSpatialPolygons based on the COUNTY field
example.df <- as(example, "data.frame")
countycol <- example.df$COUNTY
example.diss <- unionSpatialPolygons(example, countycol)

unionSpatialPolygons takes a while with this data set, but seems to work just fine.