Create presence/absence (0/1) rasters for every value in categorical raster in R?

The raster package has a function to do this for you in one line. Use layerize():

library(raster)
# make an example raster
r <- raster(nrow=100, ncol=100)
r[] <- round(runif(ncell(r),1,4),0)

# create presence/absence rasters, stored in a RasterBrick. 
r_dummy <- layerize(r)
plot(r_dummy[[1]])

Raster for "1" values

Are you sure that the dummy coding in STAN requires separate binomial parameters for the estimate? For efficiency sake, the dummy coding for say, Native Vegetation should be a factor of [1,2,3,4] and not four separate parameters. If this is, in fact, the case it is trivial to do this in R.

library(raster)
r <- raster(nrow=100, ncol=100)
  r[] <- round(runif(ncell(r),1,4),0)

r1=r; r2=r; r3=r; r4=r # make copies of original to modify
  r1[] <- ifelse(r[] == 1, 1, 0)
  r2[] <- ifelse(r[] == 2, 1, 0)
  r3[] <- ifelse(r[] == 3, 1, 0)
  r4[] <- ifelse(r[] == 4, 1, 0)

plot(stack(r1,r2,r3,r4))

You can extend this into processing all of your rasters in a for loop.

First, as an example, create raster stack with 5 layers and different unique values ranging 1 - 5. This should emulate you problem since each raster has an different number of levels. This simulated data is, of course, representing a raster stack or brick object of you nominal covariates that need to be converted to binomial. Your raster data can be read into this object class using the raster::stack or raster::brick functions.

r <- stack(brick(array(runif(100 * 100 * 5), dim=c(100, 100, 5))))
  for(i in 1:nlayers(r)) { r[[i]][] <- round(runif(ncell(r), 
                           sample(1:2,1),sample(3:5,1) ),0) }
names(r) <- paste0("parameter",1:5)

Now we can define a double loop that looks at the unique values for each raster and then loops through them to create binary rasters for each rasters unique levels. The resulting object (given the below code) will be binary.rasters.

binary.rasters <- stack()
rnames <- vector() 
  for(i in 1:nlayers(r)) {
    for(j in 1:length(unique(r[[i]][]))) {
      u <- unique(r[[i]][])
      b <- r[[i]] 
      b[] <- ifelse(b[] == u[j], 1, 0)
      binary.rasters <- addLayer(binary.rasters, b)
      rnames <- append(rnames, paste(names(r)[i], paste("level",u[j],sep="-"), sep="_") )  
    }
  }
( names(binary.rasters) <- rnames )

I added a vector rnames that tracks the raster name and appends it with the level in each sub loop. This can then be added as the names of the resulting raster stack. This will allow one to know what raster and level a given binary raster is representing however, does not necessary correspond with the parameter names used in the original model. Although, you can easily rename the elements in this vector.

I have seen this type of dummy coding in occupancy modeling (eg., software Presence) and it sure eats up parameter space. One has to wonder if it would not be prudent to form explicit hypothesis around a specific level in a given categorical variable and not think of it as a single independent variable, because that is sure not how it is behaving in the model. In this case each level in the variable is truly a separate parameter. If there are only a level or two in a given independent nominal variable that you hypothesize is going to effect your process, why include all of the levels? By reducing the parameter space you are going to get much more relevant AIC values when evaluating competing models.

Create presence/absence (0/1) rasters for every value in categorical raster in R?

Tags:

Binary

R

Raster

Logistic Regression

Reclassify

Related

Recent Posts