Minus query in HIVE

It does not appear that HQL supports the MINUS operator. See this relevant, albeit a bit old, resource:

http://www.quora.com/Apache-Hive/What-are-the-biggest-feature-gaps-between-HiveQL-and-SQL

What you want to do can be done with a LEFT JOIN or NOT EXISTS:

SELECT x
FROM abc
LEFT JOIN bcd
ON abc.x = bcd.x
WHERE bcd.x IS NULL

EDIT: Per comments below, NOT EXISTS is not supported.

SELECT x 
FROM abc
WHERE NOT EXISTS (SELECT x FROM bcd)

HQL DOES NOT support minus but you can always use Patrick Tucci solution which works fine when your select-list contains only a few fields. In my case I wanted to find the differences between an entire table (30+ fields) and a backup copy to find records that were different. Here is my solution:

select <all-my-fields>, count(*)
  from (
        select <all-my-fields> from mytable
        union all
        select <all-the-fields> from mybackuptable
       ) merged_data
group by <all-my-fields>
having count(*) = 1

Now this is not completly a "minus" since single records from mybackuptable would show up in the result which is what I wanted. To make it a complete "minus" equivalent I've added this:

select <all-my-fields>
  from (
        select max(source) source, <all-my-fields>, count(*)
          from (
                select 1 source, <all-my-fields> from mytable
                union all
                select 2 source, <all-the-fields> from mybackuptable
               ) merged_data
        group by <all-my-fields>
        having count(*) = 1
       ) minus_data
 where source = 1

Tags:

Hive