Collapse consecutive runs of numbers to a string of ranges

Adding another alternative, you could use a deparseing approach. For example:

deparse(c(1L, 2L, 3L))
#[1] "1:3"

Taking advantage of as.character "deparse"ing a given "list" as input, we could use:

as.character(split(as.integer(vec), cumsum(c(TRUE, diff(vec) != 1))))
#[1] "1:3"  "5"    "7:12"
toString(gsub(":", "-", .Last.value))
#[1] "1-3, 5, 7-12"

I assume that the vector is sorted as in the example. If not use vec <- sort(vec) beforehand.

Edit note: @DavidArenburg spotted a mistake in my original answer where c(min(x), x) should actually be c(0, x). Since we know now that we always need to add a 0 in the first place, we can omit the first step of creating x and do it "on the fly". The original answer and additional options are now edited to reflect that (you can check the edit history for the original post). Thanks David!

A note on calls to unname: I used unname(sapply(...)) to ensure that the resulting vector is not named, otherwise it would be named 0:(n-1) where n equals the length of new_vec. As @Tensibai noted correctly in the comments, this doesn't matter if the final aim is to generate a length-1 character vector as produced by running toString(new_vec) since vector names will be omitted by toString anyway.


One option (possibly not the shortest) would be:

new_vec <- unname(sapply(split(vec, c(0, cumsum(diff(vec) > 1))), function(y) {
  if(length(y) == 1) y else paste0(head(y, 1), "-", tail(y, 1))
}))

Result:

new_vec
#[1] "1-3"  "5"    "7-12"
toString(new_vec)
#[1] "1-3, 5, 7-12"

Thanks to @Zelazny7 it can be shortened by using the range function:

new_vec <- unname(sapply(split(vec, c(0, cumsum(diff(vec) > 1))), function(y) {
    paste(unique(range(y)), collapse='-')
}))

Thanks to @DavidArenburg it can be further shortened by using tapply instead of sapply + split:

new_vec <- unname(tapply(vec, c(0, cumsum(diff(vec) > 1)), function(y) {
  paste(unique(range(y)), collapse = "-")
}))

EDITS: I sped up docendo's code quite a bit by sorting the vector first, so now they are actually on equal footing.

I also added alexis' approach.

readable_integers <- function(integers)
{
  integers <- sort(unique(integers))
  group <- cumsum(c(0, diff(integers)) != 1)

  paste0(vapply(split(integers, group),
           function(x){
             if (length(x) == 1) as.character(x)
             else paste0(range(x), collapse = "-")
           },
           character(1)),
           collapse = "; ")
}

library(microbenchmark)
vec = c(1, 2, 3, 5, 7, 8, 9, 10, 11, 12)
microbenchmark(
  docendo = {vec <- sort(vec)
    x <- cumsum(diff(vec) > 1)
   toString(tapply(vec, c(min(x), x), function(y) paste(unique(range(y)), )collapse = "-"))
  },
  Benjamin = readable_integers(vec),
  alexis = {vec <- sort(vec)
            as.character(split(as.integer(vec), cumsum(c(TRUE, diff(vec) != 1))))
            toString(gsub(":", "-", .Last.value))}
)

Unit: microseconds
     expr     min       lq     mean  median       uq     max neval
  docendo 205.273 220.3755 230.3134 228.293 235.4780 467.142   100
 Benjamin 121.991 128.4420 135.5302 133.574 143.3980 161.286   100
   alexis 121.698 128.0030 137.0374 136.507 143.3975 169.790   100

set.seed(pi)
vec = sample(1:1000, 900)

set.seed(pi)
vec = sample(1:1000, 900)

microbenchmark(
  docendo = {vec <- sort(vec)
   x <- cumsum(diff(vec) > 1)
   toString(tapply(sort(vec), c(min(x), x), function(y) paste(unique(range(y)), collapse = "-")))
  },
  Benjamin = readable_integers(vec),
  alexis = {vec <- sort(vec)
            as.character(split(as.integer(vec), cumsum(c(TRUE, diff(vec) != 1))))
            toString(gsub(":", "-", .Last.value))}
)
Unit: microseconds
     expr      min        lq      mean    median        uq      max neval
  docendo 1307.294 1353.7735 1420.3088 1379.7265 1427.8190 2554.473   100
 Benjamin  615.525  626.8155  661.2513  638.8385  665.3765 1676.493   100
   alexis  799.684  808.3355  866.1516  820.0650  833.2615 1974.138   100

Tags:

Sequence

R