Hi Rossi, Historically, we used LoptOptimizeJoinRule of Calcite to do join reordering. This does a greedy search on join order search space to find a join order which is atleast as good as original join order of query. Goodness being in term of estimated cost and not globally optimal because of greedy nature of algorithm. Our initial experimentation showed this rule was generating only left leaning join tree, not considering bushy join orders, which made a huge difference in query runtime especially because for star schema setups which is common in analytical workloads, bushy joins usually are way better than left (or right) leaning trees. At this point we added OptimizeBushyJoinRule.
However, a bit more experimentation and debugging informed us that LoptOptimizeJoinRule can actually generate bushy join trees. Problem was we had bugs in our statistics/cost model which we were feeding to the rule. Once that was established we switched back to LoptOptimizeJoinRule. So, in nut shell, hive CBO can and does generate bushy joins. If you have test case where we are not generating bushy join, where we can, please post back. Will be happy to take a look. Thanks, Ashutosh On Fri, May 8, 2015 at 11:57 AM, Ruoxi Sun <zanmato1...@gmail.com> wrote: > Hi all, > > I'm studying CBO code in hive. I have a question about bushy join > optimization. > > Bushy join did get introduced in hive via HIVE-7577 > <https://issues.apache.org/jira/browse/HIVE-7577>, and played an > important role in optimizing several queries in TPCDS benchmark. Somehow I > saw the bushy join rule was removed in HIVE-7687 > <https://issues.apache.org/jira/browse/HIVE-7687>, and didn't find much > comment about the removal. > > I wonder if the bushy join is totally gone from hive trunk? And if so, why > is that? Or did I miss anything? > > Thanks in advance. > > *Rossi* >