Chao Sun created HIVE-16698: ------------------------------- Summary: HoS should avoid mapjoin optimization in case of union and using table stats Key: HIVE-16698 URL: https://issues.apache.org/jira/browse/HIVE-16698 Project: Hive Issue Type: Bug Components: Logical Optimizer, Spark Affects Versions: 3.0.0 Reporter: Chao Sun Assignee: Chao Sun
When {{hive.spark.use.ts.stats.for.mapjoin}} is true, HoS would not check whether the big table branch has upstream UNION operators. This is wrong and could generate incorrect plan. To reproduce: {code} set hive.auto.convert.join=true; set hive.auto.convert.join.noconditionaltask.size=16; set hive.spark.use.ts.stats.for.mapjoin=true; create table a (c1 string, c2 int); create table b (c3 string, c4 int); create table c (c1 string, c2 int); create table d (c3 string, c4 int); create table e (c5 string, c6 int); insert into table a values ("a1", 1), ("a2", 2), ("a3", 3), ("a4", 4), ("a5", 5), ("a6", 6), ("a7", 7); insert into table b values ("b1", 1), ("b2", 2), ("b3", 3), ("b4", 4); insert into table c values ("c1", 1), ("c2", 2), ("c3", 3), ("c4", 4), ("c5", 5), ("c6", 6), ("c7", 7); insert into table d values ("d1", 1), ("d2", 2), ("d3", 3), ("d4", 4); insert into table e values ("d1", 1), ("d2", 2); explain with t1 as ( select a.c1 as c1, a.c2 as c2, b.c3 as c3 from a join b on a.c2 = b.c4 ), t2 as ( select c.c1 as c1, c.c2 as c2, d.c3 as c3 from c join d on c.c2 = d.c4 ), t3 as ( select * from t1 union all select * from t2 ), t4 as ( select t3.c1, t3.c3, t5.c5 from t3 join e as t5 on t3.c2 = t5.c6 ) select * from t4; {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)