[ https://issues.apache.org/jira/browse/CALCITE-6176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804451#comment-17804451 ]
Egor Malko commented on CALCITE-6176: ------------------------------------- I have done small investigation. It seems like {{org.apache.calcite.rel.rules.SubQueryRemoveRule.rewriteExists()}} method is a problem. It was added in CALCITE-2329 with intention to optimize cases like {code:sql} select sal from emp where EXISTS (select deptno from dept where deptno=2) {code} Only optimization in {{rewriteExists()}} expects right table not to be used after the join and changes it freely, but it's not true in the current issue cases. Also [CALCITE-2329|https://github.com/apache/calcite/commit/1ae6a52603006741a4cb6bf558c58d9ac08e866a#diff-a57af2470b3215be0ce7d94a226f1997d9417c613fd7f32097fd07543b30633a] [ PR|https://github.com/apache/calcite/commit/1ae6a52603006741a4cb6bf558c58d9ac08e866a#diff-a57af2470b3215be0ce7d94a226f1997d9417c613fd7f32097fd07543b30633a] contains suspicious changes. Before PR, EXISTS cases were processed with IN ones, and after they started to be processed in the own {{rewriteExists()}} method with only one optimization from the issue. So are we possibly got rid from old optimizations? Please, validate my thoughts, because I'm new to the project > 'exists' in 'join on' has a wrong result with > CoreRules.JOIN_SUB_QUERY_TO_CORRELATE > ----------------------------------------------------------------------------------- > > Key: CALCITE-6176 > URL: https://issues.apache.org/jira/browse/CALCITE-6176 > Project: Calcite > Issue Type: Bug > Components: core > Affects Versions: 1.36.0 > Reporter: SimonAlexs > Priority: Critical > > In calcite 1.36.0, using flowing rules: > CoreRules.JOIN_SUB_QUERY_TO_CORRELATE > CoreRules.JOIN_CONDITION_PUSH > I tested some wrong cases(the expected result and actual result are in > comment): > * expected result: the result in mysql; > * actual result: the result in calcite1.36.0 > {code:java} > -- not exists, expected: 1, 2 > -- actual: 1, null > select * > from (select 1 id) t1 > left join (select 2 id) t2 > on not exists(select * > from (select 3 id) p > where p.id=t2.id) > -- or exists, expected: 1, null > -- actual: 1, 2 > select * > from (select 1 id) t1 > left join (select 2 id) t2 > on t1.id=t2.id or exists(select * > from (select 3 id) p > where p.id=t2.id) > -- or not exists, expected: 1, 2 > -- actual: 1, null > select * > from (select 1 id) t1 > left join (select 2 id) t2 > on t1.id=t2.id or not exists(select * > from (select 3 id) p > where p.id=t2.id) > -- and not exists, expected: 1, null > -- 2, 2 > -- actual: 1, null > -- 2, null > select * > from (select 1 id > union all > select 2) t1 > left join (select 2 id) t2 > on t1.id=t2.id and not exists(select * > from (select 3 id) p > where p.id=t1.id){code} > This seems a bug. > The rel of case 2 is as below. The reason I guess is that, The '$1' in 'IS > NOT NULL($1)' of line 3 may be wrong.Because In the rel, the $1 represents > the first column of the right table, which is usually not null depends on > right input data. However, it should represent the column name 'i' in right > table which means the 'exists' result. > This is only my guessing.The real reason for this is needed to find by users > who really understand Calcite. > LogicalProject(id=[$0], id0=[$1]) > LogicalProject(id=[$0], id0=[$1]) > LogicalJoin(condition=[OR(=($0, $1), IS NOT NULL($1))], joinType=[left]) > LogicalValues(tuples=[[\{1}]]) > LogicalCorrelate(correlation=[$cor0], joinType=[left], > requiredColumns=[\{1}]) > LogicalValues(tuples=[[\{2}]]) > LogicalProject(i=[true]) > LogicalFilter(condition=[=($0, $cor0.id0)]) > LogicalValues(tuples=[[\{3}]]) > My whole code is: > {code:java} > > Properties properties = new Properties(); > properties.put(CalciteConnectionProperty.MODEL.camelName(), "inline:"); > Connection calciteConnection = DriverManager.getConnection("jdbc:calcite:", > properties); > Statement calciteStatement = calciteConnection.createStatement(); > CalcitePrepare.Context prepareContext = > calciteStatement.unwrap(CalciteServerStatement.class).createPrepareContext(); > CalciteSchema pSpaceSchema = > prepareContext.getRootSchema().getSubSchema("ps", false); > final FrameworkConfig config = Frameworks.newConfigBuilder() > .parserConfig(SqlParser.config() > .withLex(Lex.MYSQL) > .withConformance(SqlConformanceEnum.MYSQL_5)) > > .sqlValidatorConfig(SqlValidator.Config.DEFAULT) > .defaultSchema(pSpaceSchema.plus()) > .build(); > Planner planner = Frameworks.getPlanner(config); > SqlNode parsedSql = planner.parse(sql); > SqlNode validatedSql = planner.validate(parsedSql); > RelRoot relRoot = planner.rel(validatedSql); > RelNode originRel = relRoot.rel; > HepProgram program = HepProgram.builder() > .addRuleCollection(Arrays.asList( > CoreRules.JOIN_SUB_QUERY_TO_CORRELATE > , CoreRules.JOIN_CONDITION_PUSH > )) > .build(); > HepPlanner hepPlanner = new HepPlanner(program); > hepPlanner.setRoot(originRel); > RelNode optimizedRel = hepPlanner.findBestExp(); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)