I see only 1 reduce run forerver.  Skew join?

r7raul1...@163.com
 
From: Eugene Koifman
Date: 2015-05-12 01:43
To: user
CC: r7raul1...@163.com
Subject: Re: hive sql on tez run forever
This isn’t a valid rewrite.
if a(x,y) has 1 row (1,2) and b(x,z) has 1 row (1,1) then the 1st query
will produce 1 row
but the 2nd query with subselects will not.
 
On 5/11/15, 10:13 AM, "Gopal Vijayaraghavan" <gop...@apache.org> wrote:
 
>Hi,
>
>> I change the sql where condition to (where t.update_time >=
>>'2015-05-04') , the sql can return result for a while. Because
>>t.update_time
>> >= '2015-05-04' can  filter many row when table scan. But why change
>>where condition to
>> (where t.update_time >= '2015-05-04' or length(t8.end_user_id)>0) ,the
>>sql run forever as follows:
>
>
>The OR clause is probably causing the problems.
>
>We¹re probably not pushing down the OR clauses down to the original table
>scans.
>
>This is most likely a hive PPD miss where you do something like
>
>select a.*,b.* from a,b where a.x = b.x and (a.y = 1 or b.z = 1);
>
>where it doesn¹t get planned as
>
>select a1.*, b1.* from (select a.* from a where a.y=1) a1, (select b.*
>from b where b.z = 1) b1 where a1.x = b1.x;
>
>instead gets planned as a full-scan JOIN, then a filter.
>
>Can you spend some time and try to rewrite down your case to something
>like the above queries?
>
>If that works, then file a JIRA.
>
>Cheers,
>Gopal
>
>
 

Reply via email to