[ https://issues.apache.org/jira/browse/CALCITE-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ruben Q L resolved CALCITE-5003. -------------------------------- Resolution: Fixed Fixed via https://github.com/apache/calcite/commit/2789f5e4c361b052967f42b87447f04cc1ce7896 > MergeUnion on types with different collators produces wrong result > ------------------------------------------------------------------ > > Key: CALCITE-5003 > URL: https://issues.apache.org/jira/browse/CALCITE-5003 > Project: Calcite > Issue Type: Bug > Components: core > Affects Versions: 1.27.0 > Reporter: Ruben Q L > Assignee: Ruben Q L > Priority: Minor > Labels: pull-request-available > Fix For: 1.31.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > MergeUnion on types with different collators produces wrong result. > Problem can be reproduced with the following test (in > {{EnumerableStringComparisonTest}}): > {code} > @Test void testMergeUnionOnStringDifferentCollation() { > tester() > .query("?") > .withHook(Hook.PLANNER, (Consumer<RelOptPlanner>) planner -> > planner.removeRule(EnumerableRules.ENUMERABLE_UNION_RULE)) > .withRel(b -> { > final RelBuilder builder = b.transform(c -> > c.withSimplifyValues(false)); > return builder > .values(builder.getTypeFactory().builder() > .add("name", > > builder.getTypeFactory().createSqlType(SqlTypeName.VARCHAR)).build(), > "facilities", "HR", "administration", "Marketing") > .values(createRecordVarcharSpecialCollation(builder), > "Marketing", "administration", "presales", "HR") > .union(false) > .sort(0) > .build(); > }) > .explainHookMatches("" // It is important that we have MergeUnion in > the plan > + "EnumerableMergeUnion(all=[false])\n" > + " EnumerableSort(sort0=[$0], dir0=[ASC])\n" > + " EnumerableValues(tuples=[[{ 'facilities' }, { 'HR' }, { > 'administration' }, { 'Marketing' }]])\n" > + " EnumerableSort(sort0=[$0], dir0=[ASC])\n" > + " EnumerableValues(tuples=[[{ 'Marketing' }, { > 'administration' }, { 'presales' }, { 'HR' }]])\n") > .returnsOrdered("name=administration\n" > + "name=facilities\n" > + "name=HR\n" > + "name=Marketing\n" > + "name=presales"); > } > {code} > which fails with: > {noformat} > java.lang.AssertionError: > Expected: > "name=administration\nname=facilities\nname=HR\nname=Marketing\nname=presales" > but: was > "name=administration\nname=HR\nname=Marketing\nname=administration\nname=facilities\nname=Marketing\nname=presales" > {noformat} > The problem is that, in case of different collators, the pre-requisite of the > the MergeUnion (inputs sorted) is not fulfilled, since inputs are technically > sorted, but not using the same sorting collator, so they are not comparable > by the MergeUnion algorithm. > A possible solution could be not applying EnumerableMergeUnionRule in this > case. > A more clever solution could be achieved if the rule pushes a Sort + Cast + > input (and not just Sort + input) in case the input's key type differs > collation-wise with the union's result type. -- This message was sent by Atlassian Jira (v8.20.7#820007)