Hello!
Lately we have ran into the need to implement inequality JOIN in Hive, and we could have easily done that with WHERE clause, if it was not the LEFT join. Basically, we wonder how people implement LEFT/RIGHT JOIN with inequality conditions in Hive without loss of efficiency. Thank you. Example: SELECT r_id, CASE WHEN table1.property_value = 'False' THEN FALSE WHEN table1.property_value = 'True' THEN TRUE WHEN r.rea < rg.laa THEN FALSE WHEN r.rea >= rg.laa THEN TRUE ELSE FALSE END AS flag FROM rs r LEFT JOIN public.di_re rg ON r.re = rg.re LEFT JOIN public.tbl table1 ON r.re = table1.re AND table1.property_name = 'xxxx' AND r.rea BETWEEN table1.begin_time AND table1.end_time Error: FAILED: SemanticException Line 0:-1 Both left and right aliases encountered in JOIN ... Ways to resolve: - Move inequality condition in WHERE clause: WHERE r.rea BETWEEN table1.begin_time AND table1.end_time WARNING: Affects query logic - filters all the table instead of filtering LEFT JOIN clause only; - Move condition into SELECT field with CASE statement (if possible): SELECT r_id, CASE WHEN table1.property_value = 'False' AND r.rea BETWEEN table1.begin_time AND table1.end_time THEN FALSE WHEN table1.property_value = 'True' AND r.rea BETWEEN table1.begin_time AND table1.end_time THEN TRUE Not possible in every case; - Divide queries into two separate statements and UNION them: one query with WHERE filter and another query totally omitting the JOIN to table that needed inequality as well as omitting the ids from the first query: WITH stage AS ( SELECT r_id, CASE WHEN table1.property_value = 'False' THEN FALSE WHEN table1.property_value = 'True' THEN TRUE WHEN r.rea < rg.laa THEN FALSE WHEN r.rea >= rg.laa THEN TRUE ELSE FALSE END as flag FROM rs r LEFT JOIN public.di_re rg ON r.re = rg.re LEFT JOIN public.tbl table1 ON r.region = table1.region AND table1.property_name = 'xxxx' WHERE r.rea BETWEEN table1.begin_time AND table1.end_time ) SELECT * FROM stage UNION SELECT r_id, CASE WHEN r.rea < rg.laa THEN FALSE WHEN r.rea >= rg.laa THEN TRUE ELSE FALSE END as flag FROM rs r LEFT JOIN public.di_re rg ON r.re = rg.re WHERE r_id NOT IN (SELECT DISTINCT r_id from stage) Very expensive in terms of calculation, but in some cases inevitable.