Faster version of combn

You could use combnPrim from gRbase

source("http://bioconductor.org/biocLite.R")
biocLite("gRbase") # will install dependent packages automatically.
system.time({
 d.1 <- as.data.table(t(combn(d$id, 2)))
 })
#   user  system elapsed 
# 27.322   0.585  27.674 

system.time({
d.2 <- as.data.table(t(combnPrim(d$id,2)))
 })
#   user  system elapsed 
#  2.317   0.110   2.425 

identical(d.1[order(V1, V2),], d.2[order(V1,V2),])
#[1] TRUE

Here's a way using data.table function foverlaps(), that also turns out to be fast!

require(data.table) ## 1.9.4+
d[, `:=`(id1 = 1L, id2 = .I)] ## add interval columns for overlaps
setkey(d, id1, id2)

system.time(olaps <- foverlaps(d, d, type="within", which=TRUE)[xid != yid])
#  0.603   0.062   0.717

Note that foverlaps() does not calculate all permutations. The subset xid != yid is needed to remove self overlaps. The subset could be internally handled more efficiently by implementing ignoreSelf argument - similar to IRanges::findOverlaps.

Now it's just a matter of performing a subset using the ids obtained:

system.time(ans <- setDT(list(d$id[olaps$xid], d$id[olaps$yid])))
#   0.576   0.047   0.662

So totally, ~1.4 seconds.

The advantage is that you can do the same way even if your data.table d has more than 1 column on which you've to get the combinations for, and using the same amount of memory (since we return the indices). In that case, you'd just do:

cbind(d[olaps$xid, ..your_cols], d[olaps$yid, ..your_cols])

But it's limited to replacing just combn(., 2L). Not more than 2L.

Faster version of combn

Tags:

R

Combinations

Data.Table

Combn

Related

Recent Posts