Joining two data frames with intervals misbehaves?

The bug

The object still contains the relevant information:

res <- tibble(a = rep("a", 3), b = rep(make_date(2001) %--% make_date(2002), 3)) %>% 
  left_join(tibble(a = rep("a", 3), c = rep(make_date(2002) %--% make_date(2003)))) 

print.data.frame(res)
# a                              b                              c
# 1 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
# 2 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
# 3 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
# 4 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
# 5 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
# 6 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
# 7 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
# 8 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
# 9 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC

res$c    
# [1] 2002-01-01 UTC--2003-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
# [5] 2002-01-01 UTC--2003-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
# [9] 2002-01-01 UTC--2003-01-01 UTC

But when subsetting by indices it doesn't work anywmore :

res_df <- as.data.frame(res)

head(res_df)
  a                              b                              c
1 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
2 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
3 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
4 a                         NA--NA                         NA--NA
5 a                         NA--NA                         NA--NA
6 a                         NA--NA                         NA--NA

res_df[4,"c"]
[1] NA--NA

and tibble:::print.tbl makes use of head. That's why the issue is immediately visible with tibbles and not with data.frames.

Typing str(res$b) we see that we only have 3 start values for 9 data values.

if we do:

res_df$b@start <- rep(res_df$b@start,3)
res_df$c@start <- rep(res_df$c@start,3)

eveything now print fine:

  a                              b                              c
1 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
2 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
3 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
4 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
5 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
6 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
7 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
8 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
9 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC

The Solution

We've seen that as.data.frame is not enough, left_join is the function messing things up, use merge instead:

res <- tibble(a = rep("a", 3), b = rep(make_date(2001) %--% make_date(2002), 3)) %>% 
  merge(tibble(a = rep("a", 3), c = rep(make_date(2002) %--% make_date(2003))),
        all.x=TRUE) 

head(res)
# a                              b                              c
# 1 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
# 2 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
# 3 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
# 4 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
# 5 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
# 6 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC

res[4,"c"]
#[1] 2002-01-01 UTC--2003-01-01 UTC

I've reported the issue here


Looks like a bug in tibble():

> AA <- tibble(a = rep("a", 3), b = rep(make_date(2001) %--% make_date(2002), 3))
> class(AA$b)
[1] "Interval"
attr(,"package")
[1] "lubridate"
> AA
Error in round_x - lhs :
  Arithmetic operators undefined for 'Interval' and 'Interval' classes:
  convert one to numeric or a matching time-span class.

However:

> AA <- as.data.frame(AA)
class(AA$b)
> class(AA$b)
[1] "Interval"
attr(,"package")
[1] "lubridate"
> AA
  a                              b
1 a 2001-01-01 UTC--2002-01-01 UTC
2 a 2001-01-01 UTC--2002-01-01 UTC
3 a 2001-01-01 UTC--2002-01-01 UTC

Therefore, this works:

> AA <- tibble(a = rep("a", 3), b = rep(make_date(2001) %--% make_date(2002), 3))
> BB <- tibble(a = rep("a", 3), c = rep(make_date(2002) %--% make_date(2003)))
> AA %>% as.data.frame %>% left_join(BB)
Joining, by = "a"
  a                              b                              c
1 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
2 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
3 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
4 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
5 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
6 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
7 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
8 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC
9 a 2001-01-01 UTC--2002-01-01 UTC 2002-01-01 UTC--2003-01-01 UTC

although this does not:

> AA %>% left_join(BB)
Joining, by = "a"
Error in round_x - lhs :
  Arithmetic operators undefined for 'Interval' and 'Interval' classes:
  convert one to numeric or a matching time-span class.

Note: I'm using tibble_1.4.1 (same version of lubridate and dplyr as you), on R 3.4.3 for x86_64-pc-linux-gnu