Increasing speed of crop, mask, & extract raster by many polygons in R?

I have finally gotten around to improving this function. I found that for my purposes, it was fastest to rasterize() the polygon first and use getValues() instead of extract(). The rasterizing isn't much faster than the original code for tabulating raster values in small polygons, but it shines when it came to large polygon areas that had large rasters to be cropped and the values extracted. I also found getValues() was much faster than the extract() function.

I also figured out the multi-core processing using foreach().

I hope this is useful for other people who want an R solution for extracting raster values from a polygon overlay. This is similar to the "Tabulate Intersection" of ArcGIS, which did not work well for me, returning empty outputs after hours of processing, like this user.

#initiate multicore cluster and load packages
library(foreach)
library(doParallel)
library(tcltk)
library(sp)
library(raster)

cores<- 7
cl <- makeCluster(cores, output="") #output should make it spit errors
registerDoParallel(cl)

Here's the function:

multicore.tabulate.intersect<- function(cores, polygonlist, rasterlayer){ 
  foreach(i=1:cores, .packages= c("raster","tcltk","foreach"), .combine = rbind) %dopar% {

    mypb <- tkProgressBar(title = "R progress bar", label = "", min = 0, max = length(polygonlist[[i]]), initial = 0, width = 300) 

    foreach(j = 1:length(polygonlist[[i]]), .combine = rbind) %do% {
      final<-data.frame()
      tryCatch({ #not sure if this is necessary now that I'm using foreach, but it is useful for loops.

        single <- polygonlist[[i]][j,] #pull out individual polygon to be tabulated

        dir.create (file.path("c:/rtemp",i,j,single@data$OWNER), showWarnings = FALSE) #creates unique filepath for temp directory
        rasterOptions(tmpdir=file.path("c:/rtemp",i,j, single@data$OWNER))  #sets temp directory - this is important b/c it can fill up a hard drive if you're doing a lot of polygons

        clip1 <- crop(rasterlayer, extent(single)) #crop to extent of polygon
        clip2 <- rasterize(single, clip1, mask=TRUE) #crops to polygon edge & converts to raster
        ext <- getValues(clip2) #much faster than extract
        tab<-table(ext) #tabulates the values of the raster in the polygon

        mat<- as.data.frame(tab)
        final<-cbind(single@data$OWNER,mat) #combines it with the name of the polygon
        unlink(file.path("c:/rtemp",i,j,single@data$OWNER), recursive = TRUE,force = TRUE) #delete temporary files
        setTkProgressBar(mypb, j, title = "number complete", label = j)

      }, error=function(e){cat("ERROR :",conditionMessage(e), "\n")}) #trycatch error so it doesn't kill the loop

      return(final)
    }  
    #close(mypb) #not sure why but closing the pb while operating causes it to return an empty final dataset... dunno why. 
  }
}

So to use it, adjust the single@data$OWNER to fit with the column name of your identifying polygon (guess that could have been built into the function...) and put in:

myoutput <- multicore.tabulate.intersect(cores, polygonlist, rasterlayer)

Speed up extracting raster (raster stack) from point, XY or Polygon

Great answer Luke. You must be a R wizard! Here is a very minor tweak to simplify your code (may improve performance slightly in some cases). You can avoid some operations by using cellFromPolygon (or cellFromXY for points) and then clip and getValues.

Extract polygon or points data from raster stacks ------------------------

 library(raster)  
 library(sp)   

  # create polygon for extraction
  xys= c(76.27797,28.39791,
        76.30543,28.39761,
        76.30548,28.40236,
        76.27668,28.40489)
  pt <- matrix(xys, ncol=2, byrow=TRUE)
  pt <- SpatialPolygons(list(Polygons(list(Polygon(pt)), ID="a")));
  proj4string(pt) <-"+proj=longlat +datum=WGS84 +ellps=WGS84"
  pt <- spTransform(pt, CRS("+proj=sinu +a=6371007.181 +b=6371007.181 +units=m"))
  ## Create a matrix with random data & use image()
  xy <- matrix(rnorm(4448*4448),4448,4448)
  plot(xy)

  # Turn the matrix into a raster
  NDVI_stack_h24v06 <- raster(xy)
  # Give it lat/lon coords for 36-37°E, 3-2°S
  extent(NDVI_stack_h24v06) <- c(6671703,7783703,2223852,3335852)
  # ... and assign a projection
  projection(NDVI_stack_h24v06) <- CRS("+proj=sinu +a=6371007.181 +b=6371007.181 +units=m")
  plot(NDVI_stack_h24v06)
  # create a stack of the same raster
  NDVI_stack_h24v06 = stack( mget( rep( "NDVI_stack_h24v06" , 500 ) ) )


  # Run functions on list of points
  registerDoParallel(16)
  ptm <- proc.time()
  # grab cell number
  cell = cellFromPolygon(NDVI_stack_h24v06, pt, weights=FALSE)
  # create a raster with only those cells
  r = rasterFromCells(NDVI_stack_h24v06, cell[[1]],values=F)
  result = foreach(i = 1:dim(NDVI_stack_h24v06)[3],.packages='raster',.combine=rbind,.inorder=T) %dopar% {
     #get value and store
     getValues(crop(NDVI_stack_h24v06[[i]],r))
  }
  proc.time() - ptm
  endCluster()

user system elapsed 16.682 2.610 2.530

  registerDoParallel(16)
  ptm <- proc.time()
  result = foreach(i = 1:dim(NDVI_stack_h24v06)[3],.packages='raster',.inorder=T,.combine=rbind) %dopar% {
        clip1 <- crop(NDVI_stack_h24v06[[i]], extent(pt)) #crop to extent of polygon
        clip2 <- rasterize(pt, clip1, mask=TRUE) #crops to polygon edge & converts to raster
         getValues(clip2) #much faster than extract
  }
  proc.time() - ptm
  endCluster()

user system elapsed 33.038 3.511 3.288


If a loss in the precision of the overlay is not terribly important - assuming it is precise to begin with - one can typically achieve much greater zonal calculation speeds by first converting the polygons to a raster. The raster package contains the zonal() function, which should work well for the intended task. However, you can always subset two matrices of the same dimension using standard indexing. If you must maintain polygons and you don't mind GIS software, QGIS is actually must faster at zonal statistics than either ArcGIS or ENVI-IDL.