Optimal way to organize Joins in Calcite

2023-07-03 Thread Jonathan Sternberg
Hi, I'm presently working on optimizing the ordering of joins for queries and had a few questions about the optimal way to do that with Calcite. I watched this meetup video (https://www.youtube.com/watch?v=5wQojihyJDs) and spent some time experimenting with JoinAssociateRule, JoinCommuteRule, and

Re: Optimal way to organize Joins in Calcite

2023-07-03 Thread Julian Hyde
The reason that there are two strategies is because of large joins. If your query joins 10 tables, the number of possible join orders is large (bounded by 10 factorial I believe) and therefore would overwhelm the Volcano planner, which must construct each possibility. Therefore we have a heuri

Re: Optimal way to organize Joins in Calcite

2023-07-03 Thread Roman Kondakov
Hi Jonathan, 1. As Julian mentioned, it's better to use heuristic join order for large amount of joins 2. LoptOptimizeJoinRule and MultiJoinOptimizeBushyRule AFAIK always produce tree of joins, not a MultiJoin. 3. Yes, your understanding is correct. You can check the default join order pro

Re: Optimal way to organize Joins in Calcite

2023-07-06 Thread Jonathan Sternberg
Thanks. For MultiJoin, I'm trying to get it to work with a custom cost model and custom output convention. We have our costs for operations as part of the implementation of the physical nodes. Since MultiJoin uses the costs to determine the join ordering, I'm a bit concerned that it is using the c

Re: Optimal way to organize Joins in Calcite

2023-07-06 Thread Roman Kondakov
Hi Jonathan, if you are using custom RelOptCost with custom cost model, you can apply your cost model to the logical nodes as well. 1. You need to initialize the planner with your implementation of the RelOptCostFactory 2. You can also override the default cost formula for any node (logical