Increment by 1 for every change in column

These look like a run-length encoding (rle)

x = c("a", "a", "1", "0", "b", "b", "b", "c", "1", "1")
r = rle(x)

with

> rle(x)
Run Length Encoding
  lengths: int [1:6] 2 1 1 3 1 2
  values : chr [1:6] "a" "1" "0" "b" "c" "1"

This says that the first value ("a") occurred 2 times in a row, then "1" occurred once, etc. What you're after is to create a sequence along the 'lengths', and replicate each element of sequence by the number of times the element occurs, so

> rep(seq_along(r$lengths), r$lengths)
 [1] 1 1 2 3 4 4 4 5 6 6

The other answers are semi-deceptive, since they rely on the column being a factor(); they fail when the column is actually a character().

> diff(x)
Error in r[i1] - r[-length(r):-(length(r) - lag + 1L)] : 
  non-numeric argument to binary operator

A work-around would be to map the characters to integers, along the lines of

> diff(match(x, x))
[1]  0  2  1  1  0  0  3 -5  0

Hmm, but having said that I find that rle's don't work on factors!

> f = factor(x)
> rle(f)
Error in rle(factor(x)) : 'x' must be a vector of an atomic type
> rle(as.vector(f))
Run Length Encoding
  lengths: int [1:6] 2 1 1 3 1 2
  values : chr [1:6] "a" "1" "0" "b" "c" "1"

How about using diff() and cumsum(). For example

df$var2 <- cumsum(c(1,diff(df$var1)!=0))

Building on Mr Flick answer:

df$var2 <- cumsum(c(0,as.numeric(diff(df$var1))!=0))

But if you don't want to use diff you can still use:

df$var2 <- c(0,cumsum(as.numeric(with(df,var1[1:(length(var1)-1)] != var1[2:length(var1)]))))

It starts at 0, not at 1 but I'm sure you see how to change it if you want to.

Tags:

R