[ 
https://issues.apache.org/jira/browse/CALCITE-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912124#comment-16912124
 ] 

Volodymyr Vysotskyi edited comment on CALCITE-2166 at 8/21/19 11:58 AM:
------------------------------------------------------------------------

Thanks, Xiening Dai, for moving this issue forward! I've also considered 
similar options some time ago, and here are my thoughts regarding them:
1 & 2: Currently we store the best cost value to avoid recalculating it every 
time when a new rel node is added to the rel subset. For the case when we have 
to recalculate its cost, we also have to clear rel metadata cache for current 
rel node, and for the case when something is changed, also recalculate parent 
rel nodes costs.

I agree that #3 is too complex, and I don't have other solutions for this issue.

But in either case, it would be good if 1 or 2 option will help to fix this 
issue partially without introducing significant performance degradation.


was (Author: vvysotskyi):
Thanks, Xiening Dai, for moving this issue forward! I've also considered 
similar options, and here are my thoughts regarding them:
1 & 2: Currently we store the best cost value to avoid recalculating it every 
time when a new rel node is added to the rel subset. For the case when we have 
to recalculate its cost, we also have to clear rel metadata cache for current 
rel node, and for the case when something is changed, also recalculate parent 
rel nodes costs.

I agree that #3 is too complex, and I don't have other solutions for this issue.

But in either case, it would be good if 1 or 2 option will help to fix this 
issue partially without introducing significant performance degradation.

> Cumulative cost of RelSubset.best RelNode is increased after calling 
> RelSubset.propagateCostImprovements() for input RelNodes
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-2166
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2166
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.15.0
>            Reporter: Volodymyr Vysotskyi
>            Assignee: Danny Chan
>            Priority: Critical
>
> After calling {{RelSubset.propagateCostImprovements()}} cumulative cost of 
> {{RelSubset.best}} {{RelNode}} may be increased due to the increase of the 
> non-cumulative cost caused by changing of input best {{RelNode}}.
> To observe this issue, add this code:
> {code:java}
>           if (subset.best != null) {
>             RelOptCost bestCost = getCost(subset.best, 
> RelMetadataQuery.instance());
>             if (!subset.bestCost.equals(bestCost)) {
>               throw new AssertionError(
>                 "relSubset [" + subset.getDescription()
>                   + "] has wrong best cost "
>                   + subset.bestCost + ". Correct cost is " + bestCost);
>             }
>           }
> {code}
> into {{VolcanoPlanner.validate()}} method (line 907).
> List of unit tests which fail with this check:
> {noformat}
> Failed tests: 
>   
> MaterializationTest.testJoinMaterializationUKFK9:1823->checkMaterialize:198->checkMaterialize:205->checkThatMaterialize:233
>  relSubset [rel#226287:Subset#8.ENUMERABLE.[]] has wrong best cost {221.5 
> rows, 128.25 cpu, 0.0 io}. Correct cost is {233.0 rows, 178.0 cpu, 0.0 io}
>   ScannableTableTest.testPFPushDownProjectFilterAggregateNested:279 relSubset 
> [rel#12950:Subset#5.ENUMERABLE.[]] has wrong best cost {63.8 rows, 62.308 
> cpu, 0.0 io}. Correct cost is {70.4 rows, 60.404 cpu, 0.0 io}
>   ScannableTableTest.testPFTableRefusesFilterCooperative:221 relSubset 
> [rel#13382:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   ScannableTableTest.testProjectableFilterableCooperative:148 relSubset 
> [rel#13611:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   ScannableTableTest.testProjectableFilterableNonCooperative:165 relSubset 
> [rel#13754:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 
> cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
>   FrameworksTest.testUpdate:336->executeQuery:367 relSubset 
> [rel#22533:Subset#2.ENUMERABLE.any] has wrong best cost {19.5 rows, 37.75 
> cpu, 0.0 io}. Correct cost is {22.575 rows, 52.58 cpu, 0.0 io}
> {noformat}
> For the test {{MaterializationTest.testJoinMaterializationUKFK9}} initial 
> best plan was:
> {noformat}
> EnumerableProject(empid0=[$5], empid00=[$5], deptno0=[$7]): rowcount = 15.0, 
> cumulative cost = {15.0 rows, 45.0 cpu, 0.0 io}, id = 3989
>   EnumerableJoin(subset=[rel#3988:Subset#34.ENUMERABLE.[]], condition=[=($1, 
> $7)], joinType=[inner]): rowcount = 15.0, cumulative cost = {116.0 rows, 0.0 
> cpu, 0.0 io}, id = 4797
>     EnumerableFilter(subset=[rel#4274:Subset#47.ENUMERABLE.[0]], 
> condition=[=(CAST($2):VARCHAR CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary", 'Bill')]): rowcount = 1.0, cumulative cost = {1.0 
> rows, 1.0 cpu, 0.0 io}, id = 16522
>       EnumerableTableScan(subset=[rel#158:Subset#11.ENUMERABLE.[0]], 
> table=[[hr, m0]]): rowcount = 1.0, cumulative cost = {0.0 rows, 1.0 cpu, 0.0 
> io}, id = 79
>     EnumerableTableScan(subset=[rel#115:Subset#5.ENUMERABLE.[]], table=[[hr, 
> depts]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 
> io}, id = 62
> {noformat}
> Its cumulative cost is \{221.5 rows, 123.75 cpu, 0.0 io}
> After applying some rules it became:
> {noformat}
> EnumerableProject(empid0=[$3], empid00=[$3], deptno0=[$0]): rowcount = 2.25, 
> cumulative cost = {2.25 rows, 6.75 cpu, 0.0 io}, id = 4012
>   EnumerableFilter(subset=[rel#4007:Subset#41.ENUMERABLE.[]], 
> condition=[=(CAST($2):VARCHAR CHARACTER SET "ISO-8859-1" COLLATE 
> "ISO-8859-1$en_US$primary", 'Bill')]): rowcount = 2.25, cumulative cost = 
> {2.25 rows, 15.0 cpu, 0.0 io}, id = 4811
>     EnumerableProject(subset=[rel#4203:Subset#61.ENUMERABLE.[]], deptno=[$7], 
> deptno0=[$1], name0=[$2], empid0=[$5]): rowcount = 15.0, cumulative cost = 
> {15.0 rows, 60.0 cpu, 0.0 io}, id = 4206
>       EnumerableJoin(subset=[rel#4204:Subset#52.ENUMERABLE.[]], 
> condition=[=($1, $7)], joinType=[inner]): rowcount = 15.0, cumulative cost = 
> {116.0 rows, 0.0 cpu, 0.0 io}, id = 4795
>         EnumerableTableScan(subset=[rel#158:Subset#11.ENUMERABLE.[0]], 
> table=[[hr, m0]]): rowcount = 1.0, cumulative cost = {0.0 rows, 1.0 cpu, 0.0 
> io}, id = 79
>         EnumerableTableScan(subset=[rel#115:Subset#5.ENUMERABLE.[]], 
> table=[[hr, depts]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 
> cpu, 0.0 io}, id = 62
> {noformat}
> Its cumulative cost is {{\{233.0 rows, 148.0 cpu, 0.0 io\}}}.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to