How to assign "cut" range midpoints in R?

An alternative way of calculating midpoints regardless of how you specify the breaks in "cut" function (i.e. regardless of wether you supply a vector of breakpoints or a number of bins) is using the label text that the cut function supplies.

get_midpoint <- function(cut_label) {
  mean(as.numeric(unlist(strsplit(gsub("\\(|\\)|\\[|\\]", "", as.character(cut_label)), ","))))
}

test$xMidpoint <- sapply(test$xRange, get_midpoint)

Note that this requires the "labels" argument in the cut function to be set to TRUE.


I know this is a really old question, but this may help future googlers. I wrote a function that I called midcut that cuts the data and provides me with the midpoint of the bin.

midcut<-function(x,from,to,by){
   ## cut the data into bins...
   x=cut(x,seq(from,to,by),include.lowest=T)
   ## make a named vector of the midpoints, names=binnames
   vec=seq(from+by/2,to-by/2,by)
   names(vec)=levels(x)
   ## use the vector to map the names of the bins to the midpoint values
   unname(vec[x])
}

example

test$midpoint=midcut(test$x,0,20,5)
> test
   x y  xRange midpoint
1  1 1   (0,5]      2.5
2  4 2   (0,5]      2.5
3  6 3  (5,10]      7.5
4  7 4  (5,10]      7.5
5  8 5  (5,10]      7.5
6  9 6  (5,10]      7.5
7 12 7 (10,15]     12.5
8 18 8 (15,20]     17.5
9 19 9 (15,20]     17.5

Unless I miss something, something like this looks valid:

brks = seq(0, 20, 5)
ints = findInterval(test$x, brks, all.inside = T)
#mapply(function(x, y) (x + y) / 2, brks[ints], brks[ints + 1])  #which is ridiculous
#[1]  2.5  2.5  7.5  7.5  7.5  7.5 12.5 17.5 17.5
(brks[ints] + brks[ints + 1]) / 2  #as sgibb noted
#[1]  2.5  2.5  7.5  7.5  7.5  7.5 12.5 17.5 17.5
(head(brks, -1) + diff(brks) / 2)[ints] #or using thelatemail's idea from the comments
#[1]  2.5  2.5  7.5  7.5  7.5  7.5 12.5 17.5 17.5

Tags:

R

Hmisc