How to subset consecutive rows if they meet a condition

An approach with data.table which is slightly different from @jlhoward's approach (using the same data):

library(data.table)

setDT(df)
df[, hotday := +(MAX>=44.5 & MIN>=24.5)
   ][, hw.length := with(rle(hotday), rep(lengths,lengths))
     ][hotday == 0, hw.length := 0]

this produces a datatable with a heat wave length variable (hw.length) instead of a TRUE/FALSE variable for a specific heat wave length:

> df
    YEAR MONTH DAY  MAX  MIN hotday hw.length
 1: 1989     7  18 45.0 23.5      0         0
 2: 1989     7  19 44.2 26.1      0         0
 3: 1989     7  20 44.7 24.4      0         0
 4: 1989     7  21 44.6 29.5      1         1
 5: 1989     7  22 44.4 31.6      0         0
 6: 1989     7  23 44.2 26.7      0         0
 7: 1989     7  24 44.5 25.0      1         3
 8: 1989     7  25 44.8 26.0      1         3
 9: 1989     7  26 44.8 24.6      1         3
10: 1989     7  27 45.0 24.3      0         0
11: 1989     7  28 44.8 26.0      1         1
12: 1989     7  29 44.4 24.0      0         0
13: 1989     7  30 45.2 25.0      1         1

I may be missing something here but I don't see the point of subsetting beforehand. If you have data for every day, in chronological order, you can use run length encoding (see the docs on the rle(...) function).

In this example we create an artificial data set and define "heat wave" as MAX >= 44.5 and MIN >= 24.5. Then:

# example data set
df <- data.frame(YEAR=1989, MONTH=7, DAY=18:30, 
                 MAX=c(45, 44.2, 44.7, 44.6, 44.4, 44.2, 44.5, 44.8, 44.8, 45, 44.8, 44.4, 45.2),
                 MIN=c(23.5, 26.1, 24.4, 29.5, 31.6, 26.7, 25, 26, 24.6, 24.3, 26, 24, 25))

r <- with(with(df, rle(MAX>=44.5 & MIN>=24.5)),rep(lengths,lengths))
df$heat.wave <- with(df,MAX>=44.5&MIN>=24.5) & (r>2)
df
#    YEAR MONTH DAY  MAX  MIN heat.wave
# 1  1989     7  18 45.0 23.5     FALSE
# 2  1989     7  19 44.2 26.1     FALSE
# 3  1989     7  20 44.7 24.4     FALSE
# 4  1989     7  21 44.6 29.5     FALSE
# 5  1989     7  22 44.4 31.6     FALSE
# 6  1989     7  23 44.2 26.7     FALSE
# 7  1989     7  24 44.5 25.0      TRUE
# 8  1989     7  25 44.8 26.0      TRUE
# 9  1989     7  26 44.8 24.6      TRUE
# 10 1989     7  27 45.0 24.3     FALSE
# 11 1989     7  28 44.8 26.0     FALSE
# 12 1989     7  29 44.4 24.0     FALSE
# 13 1989     7  30 45.2 25.0     FALSE

This creates a column, heat.wave which is TRUE if there was a heat wave on that day. If you need to extract only the hw days, use

df[df$heat.wave,]
#   YEAR MONTH DAY  MAX  MIN heat.wave
# 7 1989     7  24 44.5 25.0      TRUE
# 8 1989     7  25 44.8 26.0      TRUE
# 9 1989     7  26 44.8 24.6      TRUE

Tags:

R

Subset