Join Tables on Date Range in Hive

RTFM - quoting LanguageManual Joins

Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job.

You may try to move the BETWEEN filter to a WHERE clause, resulting in a lousy partially-cartesian-join followed by a post-processing cleanup. Yuck. Depending on the actual cardinality of your "skill group" table, it may work fast - or take whole days.


If your situation allows, do it in two queries.

First with the full join, which can have the range; Then with an outer join, matching on all the columns, but include a where clause for where one of the fields is null.

Ex:

create table tableC as
select a.*, b.skill_group 
    from tableA a 
    ,    tableB b 
    where a.employee_id= b.employee_id 
      and a.cal_date >= b.date_start 
      and a.cal_date <= b.date_end;

with c as (select * from TableC)
insert into tableC
select a.*, cast(null as string) as skill_group
from tableA a 
  left join c
    on (a.employee_id= c.employee_id 
    and a.cal_date  = c.cal_date)
where c.employee_id is null ;