How to speed up the plotting of polygons in R?

I found 3 ways to increase the speed of plotting the country borders from shape files for R. I found some inspiration and code from here and here.

(1) We can extract the coordinates from the shape files to get the longitude and latitudes of the polygons. Then we can put them into a data frame with the first column containing the longitudes and the second column containing latitudes. The different shapes are separated by NAs.

(2) We can remove some polygons from our shape file. The shape file is very, very detailed, but some of the shapes are tiny islands that are unimportant (for my plots, anyways). We can set a minimum polygon area threshold to keep the bigger polygons.

(3) We can simplify the geometry of our shapes using the Douglas-Peuker algorithm. The edges of our polygon shapes can be simplified, as they are very intricate in the original file. Fortunately, there is a package, rgeos, that implements this.

Set up:

# Load packages
library(rgdal)
library(raster)
library(sp)
library(rgeos)

# Load the shape files
can<-getData('GADM', country="CAN", level=0)
usa<-getData('GADM', country="USA", level=0)
mex<-getData('GADM', country="MEX", level=0)

Method 1: Extract the coordinates from the the shape files into a data frame and plot lines

The major disadvantage is that we lose some information here when compared to keeping the object as a SpatialPolygonsDataFrame object, such as the projection. However, we can turn it back into an sp object and add back the projection information, and it is still faster than plotting the original data.

Note that this code runs very slowly on the original file because there are a lot of shapes, and the resulting data frame is ~2 million rows long.

Code:

# Convert the polygons into data frames so we can make lines
poly2df <- function(poly) {
  # Convert the polygons into data frames so we can make lines
  # Number of regions
  n_regions <- length(poly@polygons)

  # Get the coords into a data frame
  poly_df <- c()
  for(i in 1:n_regions) {
    # Number of polygons for first region
    n_poly <- length(poly@polygons[[i]]@Polygons)
    print(paste("There are",n_poly,"polygons"))
    # Create progress bar
    pb <- txtProgressBar(min = 0, max = n_poly, style = 3)
    for(j in 1:n_poly) {
      poly_df <- rbind(poly_df, NA, 
                       poly@polygons[[i]]@Polygons[[j]]@coords)
      # Update progress bar
      setTxtProgressBar(pb, j)
    }
    close(pb)
    print(paste("Finished region",i,"of",n_regions))
  }
  poly_df <- data.frame(poly_df)
  names(poly_df) <- c('lon','lat')
  return(poly_df)
}

Method 2: Remove small polygons

There are many small islands that are not very important. If you check some of the quantiles of the areas for the polygons, we see that many of them are miniscule. For the Canada plot, I went down from plotting over a thousand polygons to just hundreds of polygons.

Quantiles for the size of polygons for Canada:

          0%          25%          50%          75%         100% 
4.335000e-10 8.780845e-06 2.666822e-05 1.800103e-04 2.104909e+02 

Code:

# Get the main polygons, will determine by area.
getSmallPolys <- function(poly, minarea=0.01) {
  # Get the areas
  areas <- lapply(poly@polygons, 
                  function(x) sapply(x@Polygons, function(y) y@area))

  # Quick summary of the areas
  print(quantile(unlist(areas)))

  # Which are the big polygons?
  bigpolys <- lapply(areas, function(x) which(x > minarea))
  length(unlist(bigpolys))

  # Get only the big polygons and extract them
  for(i in 1:length(bigpolys)){
    if(length(bigpolys[[i]]) >= 1 && bigpolys[[i]] >= 1){
      poly@polygons[[i]]@Polygons <- poly@polygons[[i]]@Polygons[bigpolys[[i]]]
      poly@polygons[[i]]@plotOrder <- 1:length(poly@polygons[[i]]@Polygons)
    }
  }
  return(poly)
}

Method 3: Simplify the geometry of the polygon shapes

We can reduce the number of vertices in our polygon shapes using the gSimplify function from the rgeos package

Code:

can <- getData('GADM', country="CAN", level=0)
can <- gSimplify(can, tol=0.01, topologyPreserve=TRUE)

Some benchmarks:

I used elapsed system.time to benchmark my plotting times. Note that these are just the times for plotting the countries, without the contour lines and other extra things. For the sp objects, I just used the plot function. For the data frame objects, I used the plot function with type='l' and the lines function.

Plotting the original Canada, USA, Mexico polygons:

73.009 seconds

Using Method 1:

2.449 seconds

Using Method 2:

17.660 seconds

Using Method 3:

16.695 seconds

Using Method 2 + 1:

1.729 seconds

Using Method 2 + 3:

0.445 seconds

Using Method 2 + 3 + 1:

0.172 seconds

Other remarks:

It seems that the combination of methods 2 + 3 gives sufficient speed ups to the plotting of polygons. Using methods 2 + 3 + 1 adds the problem of losing the nice properties of sp objects, and my main difficulty is applying projections. I hacked something together to project a data frame object, but it runs rather slow. I think using method 2 + 3 provides sufficient speed ups for me until I can get the kinks out of using method 2 + 3 + 1.


Everyone should look into transfering over to sf (spatial features) package instead of sp. It is significantly faster (1/60th in this case) and easier to use. Here is an example of reading in a shp and plotting via ggplot2.

Note: You need to reinstall ggplot2 from the most recent build on github (see below)

library(rgdal)
library(sp)
library(sf)
library(plyr)
devtools::install_github("tidyverse/ggplot2")
library(ggplot2)

# Load the shape files
can<-getData('GADM', country="CAN", level=0)
td <- file.path(tempdir(), "rgdal_examples"); dir.create(td)
st_write(st_as_sf(can),file.path(td,'can.shp'))


ptm <- proc.time()
  can = readOGR(dsn=td, layer="can")
  can@data$id = rownames(can@data)
  can.points = fortify(can, region="id")
  can.df = join(can.points, can@data, by="id")
  ggplot(can.df) +  geom_polygon(aes(long,lat,group=group,fill='NAME_ENGLISH'))
proc.time() - ptm

user  system elapsed 
683.344   0.980 684.51 

ptm <- proc.time()
  can2 = st_read(file.path(td,'can.shp'))  
  ggplot(can2)+geom_sf( aes(fill = 'NAME_ENGLISH' )) 
proc.time() - ptm

user  system elapsed 
11.340   0.096  11.433 

Tags:

R

Polygon