Measure peak memory usage in R

I found what I was looking for in the package peakRAM. From the documentation:

This package makes it easy to monitor the total and peak RAM used so that developers can quickly identify and eliminate RAM hungry code.

mem <- peakRAM({
  for(i in 1:5) {
    mean(rnorm(1e7))
  }
})
mem$Peak_RAM_Used_MiB # 10000486MiB

mem <- peakRAM({
  for(i in 1:5) {
    mean(rnorm(1e7))
  }
})
mem$Peak_RAM_Used_MiB # 10005266MiB <-- almost the same!

The object returned by lapply weights only 488 bytes because it's summarized : garbage collection has deleted the intermediate objects after mean calculation.
help('Memory') gives useful information on how R manages memory.
In particular, you can use object.size() to follow-up size of individual objects, and memory.size() to know how much total memory is used at each step :

# With mean calculation
gc(reset = T)
#>          used (Mb) gc trigger (Mb) max used (Mb)
#> Ncells 405777 21.7     831300 44.4   405777 21.7
#> Vcells 730597  5.6    8388608 64.0   730597  5.6
sum(gc()[, "(Mb)"]) 
#> [1] 27.3

l<-lapply(1:3, function(x) {
  mx <- replicate(10, rnorm(1e6)) # 80Mb object
  mean(mx)
  print(paste('Memory used:',memory.size()))
})
#> [1] "Memory used: 271.04"
#> [1] "Memory used: 272.26"
#> [1] "Memory used: 272.26"

object.size(l)
#> 488 bytes


## Without mean calculation :
gc(reset = T)
#>          used (Mb) gc trigger  (Mb) max used (Mb)
#> Ncells 464759 24.9     831300  44.4   464759 24.9
#> Vcells 864034  6.6   29994700 228.9   864034  6.6
gcinfo(T)
#> [1] FALSE
sum(gc()[, "(Mb)"]) 
#> [1] 31.5
l<-lapply(1:4, function(x) {
  mx <- replicate(10, rnorm(1e6))
  print(paste('New object size:',object.size(mx)))
  print(paste('Memory used:',memory.size()))
  mx
})
#> [1] "New object size: 80000216"
#> [1] "Memory used: 272.27"
#> [1] "New object size: 80000216"
#> [1] "Memory used: 348.58"
#> [1] "New object size: 80000216"
#> [1] "Memory used: 424.89"
#> [1] "New object size: 80000216"
#> [1] "Memory used: 501.21"

object.size(l)
#> 320000944 bytes
sum(gc()[, "(Mb)"]) 
#> [1] 336.7

Created on 2020-08-20 by the reprex package (v0.3.0)

If instead of returning mean you return the whole object, the increase in memory use is significant.


You can use the gc function for that.

Indeed, the gc function provides the current and maximum memory used within the fields 11 and 12 (in Mb regarding the documentation, but obviously in Mio in practice on my machine). You can reset the maximum value with the parameter reset=TRUE. Here is an example:

> gc(reset=TRUE)
         used (Mb) gc trigger   (Mb) max used (Mb)
Ncells 318687 17.1     654385   35.0   318687 17.1
Vcells 629952  4.9  397615688 3033.6   629952  4.9
> a = runif(1024*1024*64)  # Should request 512 Mio to the GC (on my machine)
> gc()
           used  (Mb) gc trigger   (Mb) max used  (Mb)
Ncells   318677  17.1     654385   35.0   318834  17.1
Vcells 67738785 516.9  318092551 2426.9 67739236 516.9
> memInfo <- gc()
> memInfo[11]              # Maximum Ncells
[1] 17.1
> memInfo[12]              # Maximum Vcells
[1] 516.9
> rm(a)                    # `a` can be removed by the GC from this point
> gc(reset=TRUE)           # Order to reset the GC infos including the maximum
         used (Mb) gc trigger   (Mb) max used (Mb)
Ncells 318858 17.1     654385   35.0   318858 17.1
Vcells 630322  4.9  162863387 1242.6   630322  4.9
> memInfo <- gc()
> memInfo[11]
[1] 17.1
> memInfo[12]              # The maximum has been correctly reset
[1] 4.9

In this example we can see that up to 516.9 - 4.9 = 512 Mb has been allocated by the GC between the two gc calls surrounding the runif call (which is consistent with the expected result).