Hi,
I'm running TPCH query 21 on Hive. 0.12 and have enabled
hive.optimize.correlation.
I could see the effect of the correlation optimizer on query 17 but when
running query 21 I don't actually see the optimizer being used. I used the
publicly available tpc-h queries for hive and merged all the
Hi Avrilia,
It is caused by distinct aggregations in TPC-H Q21. Because Hive adds those
distinct columns in the key columns of ReduceSinkOperators and correlation
optimizer only check exact same key columns right now, this query will not
be optimized. The jira of this issue is
Hi Yin,
Thanks for the detailed explanation. I have one more question for the
correlation optimizer. When I ran explain in query 17 I get the plan for
stage 1 where the bulk of the time goes. I can understand what is happening
in the map phase but the reduce phase confuses me when the optimizer