Re: Questions about bushy join

2019-05-29 Thread Paul Rogers
Hi Volodymyr, You are right, fancy join planning only makes sense if we have useful row and key cardinality information. I seem to recall that Drill estimated row counts based on file size. Clearly, a 10 MB file size has far fewer rows than a 1 GB file. Do we no longer do that (or is my memory

Re: Questions about bushy join

2019-05-29 Thread weijie tong
Calcite's Programs.heuristicJoinOrder method with a bushy boolean parameter. If the bushy parameter is true, it will choose MultiJoinOptimizeBushyRule otherwise LoptOptimizeJoinRule. Glad to get message that LoptOptimizeJoinRule could also produce the bushy tree @Jinfeng. On Wed, May 29, 2019 at

Re: Questions about bushy join

2019-05-28 Thread Jinfeng Ni
I'm not sure how you got the conclusion that LoptOptmizeJoinRule would not produce bushy tree join plan. I just tried with tpch Q5 and Q10 on the sample dataset, and seems that the plans that I got are not left-deep join tree. ( I could not upload an image to show the visualized plan for those

Re: Questions about bushy join

2019-05-27 Thread weijie tong
Thanks for the answer. The blog[1] from hive shows that a optimal bushy tree plan could give a better query performance.At the bushy join case, it will make the more build side of hash join nodes works parallel also with reduced intermediate data size. To the worry about plan time cost, most

Re: Questions about bushy join

2019-05-27 Thread Paul Rogers
Hi All, Weijie, do you have some example plans that would appear to be sub-optimal, and would be improved with a bushy join plan? What characteristic of the query or schema causes the need for a busy plan? FWIW, Impala uses a compromise approach: it evaluates left-deep plans, then will "flip"

Re: Questions about bushy join

2019-05-27 Thread Aman Sinha
Hi Weijie, As you might imagine Busy joins have pros and cons compared to Left-deep only plans: The main pro is that they enumerate a lot more plan choices such that the planner is likely to find the optimal join order. On the other hand, there are significant cons: (a) by enumerating more join

Questions about bushy join

2019-05-27 Thread weijie tong
Hi all: Does anyone know why we don't support bushy join in the query plan generation while hep planner is enabled. The codebase shows the fact that the PlannerPhase.JOIN_PLANNING use the LoptOptimizeJoinRule not calcite's MultiJoinOptimizeBushyRule.