Hi, > I change the sql where condition to (where t.update_time >= >'2015-05-04') , the sql can return result for a while. Because >t.update_time > >= '2015-05-04' can filter many row when table scan. But why change >where condition to > (where t.update_time >= '2015-05-04' or length(t8.end_user_id)>0) ,the >sql run forever as follows:
The OR clause is probably causing the problems. We¹re probably not pushing down the OR clauses down to the original table scans. This is most likely a hive PPD miss where you do something like select a.*,b.* from a,b where a.x = b.x and (a.y = 1 or b.z = 1); where it doesn¹t get planned as select a1.*, b1.* from (select a.* from a where a.y=1) a1, (select b.* from b where b.z = 1) b1 where a1.x = b1.x; instead gets planned as a full-scan JOIN, then a filter. Can you spend some time and try to rewrite down your case to something like the above queries? If that works, then file a JIRA. Cheers, Gopal
