Can I cache data loading in R?

Package ‘R.cache’ R.cache

    start_year <- 2000
    end_year <- 2013
    brics_countries <- c("BR","RU", "IN", "CN", "ZA")
    indics <- c("NY.GDP.PCAP.CD", "TX.VAL.TECH.CD", "SP.POP.TOTL", "IP.JRN.ARTC.SC",
        "GB.XPD.RSDV.GD.ZS", "BX.GSR.CCIS.ZS", "BX.GSR.ROYL.CD", "BM.GSR.ROYL.CD")

    key <- list(brics_countries, indics, start_year, end_year)
    brics_data <- loadCache(key)
    if (is.null(brics_data)) {
      brics_data <- WDI(country=brics_countries, indicator=indics, 
                        start=start_year, end=end_year,  extra=FALSE,                 cache=NULL)
      saveCache(brics_data, key=key, comment="brics_data")
    }

Sort of. There are a few answers:

  1. Use a faster csv read: fread() in the data.table() package is beloved by many. Your time may come down to a second or two.

  2. Similarly, read once as csv and then write in compact binary form via saveRDS() so that next time you can do readRDS() which will be faster as you do not have to load and parse the data again.

  3. Don't read the data but memory-map it via package mmap. That is more involved but likely very fast. Databases uses such a technique internally.

  4. Load on demand, and eg the package SOAR package is useful here.

Direct caching, however, is not possible.

Edit: Actually, direct caching "sort of" works if you save your data set with your R session at the end. Many of us advise against that as clearly reproducible script which make the loading explicit are preferably in our view -- but R can help via the load() / save() mechanism (which lots several objects at once where saveRSS() / readRDS() work on a single object.

Tags:

Caching

R

Startup