Hive doesn't support in, exists. How do I write the following query?

You can do the same with a LEFT OUTER JOIN in Hive:

SELECT A.id
FROM A
LEFT OUTER JOIN B
ON (B.id = A.id)
WHERE B.id IS null

Hive seems to support IN, NOT IN, EXIST and NOT EXISTS from 0.13.

select count(*)
from flight a
where not exists(select b.tailnum from plane b where b.tailnum = a.tailnum);

The subqueries in EXIST and NOT EXISTS should have correlated predicates (like b.tailnum = a.tailnum in above sample) For more, refer Hive Wiki > Subqueries in the WHERE Clause


Should you ever want to do an IN as so:

SELECT id FROM A WHERE id IN (SELECT id FROM B)

Hive has this covered with a LEFT SEMI JOIN:

SELECT a.key, a.val
FROM a LEFT SEMI JOIN b on (a.key = b.key)