How to speed up raster to polygon conversion in R?

Gdal Installation

Install Gdal command line tools and check to see if its binaries are added to path environment variable. e.g. in windows: open Run and type:

rundll32.exe sysdm.cpl,EditEnvironmentVariables 

Then follow the screenshot

enter image description here

Download and install gdal python bindings from here according to your python and OS.

install it using:

pip.exe install GDAL-2.0.2-cp27-none-win32.whl

You may encounter issues while installing gdal. Please see Installing GDAL with Python on windows?

In that thread users have suggested that Gdal binaries from gisinternals installs both the command line tools and the python bindings. Try to install it from there. Thus none of the above steps would be relevant.

To Make sure Gdal is installed open a command prompt and type:

ogrinfo

and check it works and you don't get:

'ogrinfo' is not recognized as an internal or external command, operable program or batch file.

To check gdal python bindings is installed open command prompt and type:

python
import gdal

if you get the following error then it is not installed:

Traceback (most recent call last): File "", line 1, in ImportError: No module named gdal

R side

Source the following r function from John Baumgartner blog:

gdal_polygonizeR <- function(x, outshape=NULL, gdalformat = 'ESRI Shapefile',
                             pypath=NULL, readpoly=TRUE, quiet=TRUE) {
  if (isTRUE(readpoly)) require(rgdal)
  if (is.null(pypath)) {
    pypath <- Sys.which('gdal_polygonize.py')
  }
  if (!file.exists(pypath)) stop("Can't find gdal_polygonize.py on your system.")
  owd <- getwd()
  on.exit(setwd(owd))
  setwd(dirname(pypath))
  if (!is.null(outshape)) {
    outshape <- sub('\\.shp$', '', outshape)
    f.exists <- file.exists(paste(outshape, c('shp', 'shx', 'dbf'), sep='.'))
    if (any(f.exists))
      stop(sprintf('File already exists: %s',
                   toString(paste(outshape, c('shp', 'shx', 'dbf'),
                                  sep='.')[f.exists])), call.=FALSE)
  } else outshape <- tempfile()
  if (is(x, 'Raster')) {
    require(raster)
    writeRaster(x, {f <- tempfile(fileext='.tif')})
    rastpath <- normalizePath(f)
  } else if (is.character(x)) {
    rastpath <- normalizePath(x)
  } else stop('x must be a file path (character string), or a Raster object.')
  system2('python', args=(sprintf('"%1$s" "%2$s" -f "%3$s" "%4$s.shp"',
                                  pypath, rastpath, gdalformat, outshape)))
  if (isTRUE(readpoly)) {
    shp <- readOGR(dirname(outshape), layer = basename(outshape), verbose=!quiet)
    return(shp)
  }
  return(NULL)
}

use this command to do the job:

r.to.poly<-gdal_polygonizeR(r1,pypath = "C:\\Program Files\\GDAL\\gdal_polygonize.py")#, dissolve = T)

Change pypath parameter according to your system. Be cautious that gdal_plygonize creates huge shapefiles. My 1 MB tif converted to a 128 MB shapefile. R needs a lot of memory to open this shapefile. Although the conversion was very fast. Thanks to python and gdal!

Another option would be Esri r-bridge to do the computation in Arcgis and return the output to R. However r-bridge doesn't support raster layers yet. (Thanks to @JeffreyEvans)

However the Gdal method is commercial free!


There is a "new" method from the stars package, which revolutionized the workflow for me (I was using the gdal_polygonizeR function previously). It has been faster than the John Baumgartner solution for all complicated rasters I have tried it on. Further, it does not require the gdal_polygonize.py script, which can be difficult to install on some machines. Try:

r.to.poly <- sf::as_Spatial(sf::st_as_sf(stars::st_as_stars(r1), 
                            as_points = FALSE, merge = TRUE)
                            ) # requires the sf, sp, raster and stars packages

See also this thread.

PS. If your polygons are complex, you probably need to validate them, before you can use them to calculate area:

rgeos::gIsValid(r.to.poly) # FALSE here means that you'll need to run the buffer routine:
r.to.poly <- rgeos::gBuffer(r.to.poly, byid = TRUE, width = 0)

If the polygons are very complex, you may need to add buffer width to get rid of all crossing edges:

r.to.poly <- rgeos::gBuffer(r.to.poly, byid = TRUE, width = 1000)
r.to.poly <- rgeos::gBuffer(r.to.poly, byid = TRUE, width = -1000)

Further, raster::area is quicker also for polygons than rgeos::gArea, I think (have not tested this properly, though).