Fixing orphaned holes in R
I've analysed the geometry issues in the attached data, and it seems it does not ONLY have
orphaned holes but also
geometry validity issues. It's true that an
orphaned hole is somehow a geometry validity issue, but rgeos does not handle it in the same way, as for orphaned holes, an error is raised, instead of a simple warning. As you indicate, they are hints to check polygon holes, but it is not always successfull when applied in order to fix orphaned holes.
clean your data (which is required if you wish to do geoprocessing like union)
use the cleaned data with your union process
1. Cleaning geometry
Fixing geometries in R can be sometimes challenging, so i've tried to built an experimental R package (see https://github.com/eblondel/cleangeo) that intends to facilitate cleaning of
sp objects (at now limited on polygonal shapes). You can install the package with:
require(devtools) install_github("eblondel/cleangeo") require(cleangeo)
To start, it's good that you see what are the geometry issues with your source data. For this, you can run the following (your data is large so it can take some time):
#get a report of geometry validity & issues for a sp spatial object report <- clgeo_CollectionReport(sp) summary <- clgeo_SummaryReport(report) issues <- report[report$valid == FALSE,]
With this, you will see that your data has 2 kinds of issues:
orphaned holes and
geometry validity issues. Both (and not only the orphaned holes) are likely to make the
union process failing, so the data should be cleaned before, in an automate way when possible. For a fast reproduction, the first sample code below only takes the subset of features that are tagged as suspicious (except the latest one, with index = 9002 in the original data - see my note below on this)
#get suspicious features (indexes) nv <- clgeo_SuspiciousFeatures(report) mysp <- sp[nv[-14],] #try to clean data mysp.clean <- clgeo_Clean(mysp, print.log = TRUE) #check if they are still errors report.clean <- clgeo_CollectionReport(mysp.clean) summary.clean <- clgeo_SummaryReport(report.clean)
clgeo_Clean does well the job, you should get all geometries valid now. You can apply this to the complete dataset (except feature index = 9002)
#try to clean data mysp <- sp[-9002,] mysp.clean <- clgeo_Clean(mysp, print.log = TRUE) #check if they are still errors report.clean <- clgeo_CollectionReport(mysp.clean) summary.clean <- clgeo_SummaryReport(report.clean)
2. Union process
Now, let's see if the
union works on this dataset:
#Attempting a UnionSpatialPolygons based on the COUNTY field mysp.df <- as(mysp, "data.frame") countycol <- mysp.df$COUNTY mysp.diss <- unionSpatialPolygons(mysp.clean, countycol)
Note: as said before, i've remove one feature (index = 9002).By plotting it:
plot(sp[9002,]), you will see that this feature is very (very) complex. I've excluded it from the sample only because checking holes was taking too much time. Let's see now if the same problem occurs using
maptools) for reading the data...
3. Switch to readShapePoly vs. readOGR for reading data (UPDATE)
readOGR is not the only function available to read shapefiles. You can also use
maptools package, generally more performant than the first one:
require(maptools) mysp <- readShapePoly("ReproducibleExample.shp")
Apart from running faster:
if you use the above code based on
clgeo_CollectionReport, there is no problem of orphaned holes, but still problems of geometry.
Cleaning the geometry with
clgeo_Cleanalso runs well, and now it doesn't get stuck with the feature index 9002
And... the union process works.
See below the plot result:
#plot the result plot(mysp, border= "lightgray") plot(mysp.diss, border="red", add = TRUE)
Conclusion: prefer maptools to read your shapefile data, and consider using cleangeo to clean your data before any geoprocessing.
A convenient solution that keeps working for me in R is to apply a zero-width buffer:
#loading required packages require(sp) require(rgdal) require(maptools) require(rgeos) #load example data, set "dsn=" to your working directory or specify the path example <- readOGR(dsn=".",layer="ReproducibleExample") #project your data (I'm using California Albers here) and apply a zero-width buffer example <- spTransform(example, CRS("+init=epsg:3310")) example <- gBuffer(example, byid = T, width = 0) #Attempting a UnionSpatialPolygons based on the COUNTY field example.df <- as(example, "data.frame") countycol <- example.df$COUNTY example.diss <- unionSpatialPolygons(example, countycol)
unionSpatialPolygons takes a while with this data set, but seems to work just fine.