Ugly as hell, but should work.

Dudu



SELECT r_id,
       CASE WHEN table1.property_value = 'False' THEN FALSE
            WHEN table1.property_value = 'True' THEN TRUE
            WHEN r.rea <  rg.laa THEN FALSE
            WHEN r.rea >= rg.laa THEN TRUE
            ELSE FALSE END AS flag
  FROM rs r
  LEFT JOIN public.di_re rg
    ON r.re = rg.re
  LEFT JOIN (select    r.r_id, table1.property_value
            from    rs r
                    join public.tbl table1
                    ON r.re = table1.re
            where  table1.property_name = 'xxxx'
               AND r.rea BETWEEN table1.begin_time AND table1.end_time
            ) table1

   on r.r_id = table1.r_id

From: Goden Yao [mailto:goden...@apache.org]
Sent: Saturday, November 05, 2016 9:22 AM
To: user@hive.apache.org
Subject: Hive Left Join inequality condition


Hello!

Lately we have ran into the need to implement inequality JOIN in Hive, and we 
could have easily done that with WHERE clause, if it was not the LEFT join.
Basically, we wonder how people implement LEFT/RIGHT JOIN with inequality 
conditions in Hive without loss of efficiency.
Thank you.
Example:

SELECT r_id,

       CASE WHEN table1.property_value = 'False' THEN FALSE

            WHEN table1.property_value = 'True' THEN TRUE

            WHEN r.rea <  rg.laa THEN FALSE

            WHEN r.rea >= rg.laa THEN TRUE

            ELSE FALSE END AS flag

  FROM rs r

  LEFT JOIN public.di_re rg

    ON r.re<http://r.re> = rg.re<http://rg.re>

  LEFT JOIN public.tbl table1

    ON r.re<http://r.re> = table1.re<http://table1.re>

   AND table1.property_name = 'xxxx'

   AND r.rea BETWEEN table1.begin_time AND table1.end_time

Error:

FAILED: SemanticException Line 0:-1 Both left and right aliases encountered in 
JOIN ...

Ways to resolve:
·         Move inequality condition in WHERE clause:

·     WHERE r.rea BETWEEN table1.begin_time AND table1.end_time

·     WARNING: Affects query logic - filters all the table instead of filtering 
LEFT JOIN clause only;
·         Move condition into SELECT field with CASE statement (if possible):

·     SELECT r_id,

·          CASE WHEN table1.property_value = 'False'

·                    AND r.rea BETWEEN table1.begin_time AND  table1.end_time 
THEN FALSE

·               WHEN table1.property_value = 'True'

·                    AND r.rea BETWEEN table1.begin_time AND table1.end_time 
THEN TRUE
Not possible in every case;
·         Divide queries into two separate statements and UNION them: one query 
with WHERE filter and another query totally omitting the JOIN to table that 
needed inequality as well as omitting the ids from the first query:

·     WITH stage AS (

·     SELECT r_id,

·          CASE WHEN table1.property_value = 'False' THEN FALSE

·               WHEN table1.property_value = 'True' THEN TRUE

·               WHEN r.rea <  rg.laa THEN FALSE

·               WHEN r.rea >= rg.laa THEN TRUE

·               ELSE FALSE END as flag

·     FROM rs r

·     LEFT JOIN public.di_re rg

·       ON r.re<http://r.re> = rg.re<http://rg.re>

·     LEFT JOIN public.tbl table1

·       ON r.region = table1.region

·      AND table1.property_name = 'xxxx'

·     WHERE r.rea BETWEEN table1.begin_time AND table1.end_time

·     )

·     SELECT * FROM stage

·     UNION

·     SELECT r_id,

·          CASE WHEN r.rea <  rg.laa THEN FALSE

·               WHEN r.rea >= rg.laa THEN TRUE

·               ELSE FALSE END as flag

·     FROM rs r

·     LEFT JOIN public.di_re rg

·       ON r.re<http://r.re> = rg.re<http://rg.re>

·     WHERE r_id NOT IN (SELECT DISTINCT r_id from stage)
Very expensive in terms of calculation, but in some cases inevitable.
​

Reply via email to